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INTRODUCTION 


Probability is the branch of mathematics concerning numerical descriptions ofhow 
likely an event is to occur or how likely it is that a proposition is true. Probability is 
simply how likely something is to happen. The probability of an event is a number 
between 0 and 1, where, roughly speaking, 0 indicates impossibility of the event 
and 1 indicates certainty. The higher the probability of an event, the more likely it 
is that the event will occur. These concepts have been given an axiomatic 
mathematical formalization in probability theory, which is used widely in the areas 
of mathematics, statistics, finance, gambling, science (in particular physics), artificial 
intelligence/machine learning, computer science, game theory, and philosophy to 
draw inferences about the expected frequency of events. Probability theory is also 
used to describe the underlying mechanics and regularities of complex systems. 


‘Probability’ and ‘Statistics’ are fundamentally related. The probability theory 
describes statistical phenomenon and analyses them to study correlation and 
regression, sampling methods, business decisions, statistical inferences, etc. 
Probability is considered as theory of chance whereas statistics is considered as a 
mathematical science pertaining to the collection, analysis, interpretation or 
explanation, and presentation of data. Statistical analysis is very important for 
taking decisions and is widely used by academic institutions, the natural and social 
sciences departments, the government and business organizations. 


The word ‘Statistics’ is derived from the Latin word ‘status’ which means 
apolitical state or government. It was originally applied in connection with kings 
and monarchs collecting data on their citizenry which pertained to state wealth, 
collection of taxes, study of population, and so on. The subject of statistics is 
primarily concerned with making decisions about various disciplines of market 
and employment, such as stock market trends, unemployment rates in various 
sectors of industry, demographic shifts, interest rate and inflation rate over the 
years, and so on. To a layman, it often refers to a column of figures, or perhaps 
tables, graphs and charts related to areas, such as population, national income, 
expenditures, production, consumption, supply, demand, sales, imports, exports, 
births, deaths and accidents. 


Hence, the subject of statistics deals primarily with numerical data gathered 
from surveys or collected using various statistical methods. Its objective is to 
summarize such data, so that the summary gives us a good indication about certain 
characteristics of a population or phenomenon that we wish to study. To ensure 
that our conclusions are meaningful, it is necessary to subject our data to scientific 
analyses so that rational decisions can be made. Statistics is therefore concerned 
with proper collection of data, organization of this data into a manageable and 
presentable form, analysis and interpretation of the data into conclusions for useful 
purposes. 


The book ‘Probability and Statistics’ is divided into four blocks that are 
further divided into fourteen units which will help you understand how to solve the 
probability and multivariate distributions, set theory, random variables of the discrete 
type, random variables of the continuous type, some special expectations, 
Chebyshev’s inequality, correlation coefficient, special distributions and distribution 
function of random variables, correlation coefficient, binomial and related 
distributions, Poisson distribution, Gamma and Chi-Square distributions, normal 
distribution, bivariate normal distribution, sampling theory, transformations of 
variables of the continuous type (Beta, tand F distributions), distributions of order 
statistics, the moment generating function techniques, distributions of X and 
ns’/o’, expectations of functions of random variables, limiting distributions, 
convergence in distribution, convergence in probability, limiting moment generating 
functions and some theorems on limiting distributions. 


The book follows the Self-Instruction Mode or the SIM format wherein 
each unit begins with an ‘Introduction’ to the topic followed by an outline of the 
‘Objectives’. The content is presented in a simple, organized and comprehensive 
form interspersed with ‘Check Your Progress’ questions and answers for better 
understanding of the topics covered. A list of ‘Key Words’ along with a ‘Summary’ 
and a set of ‘Self Assessment Questions and Exercises’ is provided at the end of 
the each unit for effective recapitulation. 
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1.0 INTRODUCTION 


The subject of probability is in itself a vast one, hence only the basic concepts will 
be discussed in this unit. The word probability or chance is very commonly used 
in day-to-day conversation, and terms such as possible or probable or likely, all 
mean the same. Probability can be defined as a measure of the likelihood that 
a particular event will occur. It is a numerical measure with a value between 0 
and 1 of such likelihood where the probability of zero indicates that the given 
event cannot occur and the probability of one assures certainty of such an 
occurrence. Probability theory helps a decision-maker to analyse a situation and 
decide accordingly. The following are a few examples of such situations: 


e What is the chance that sales will increase if the price of the product is 
decreased? 


e What is the likelihood that anew machine will increase productivity? 
e How likely is it that a given project will be completed on time? 


e What are the possibilities that a competitor will introduce a cheaper substitute 


in the market? 
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In set theory, the members ofa set are called elements. Sets having a definite 
number of elements are termed as finite sets. You will also learn about singleton 
set, equality of sets, subsets, empty set or null set and power set. Union, intersection 
and complement operations on sets are analogous to arithmetic operations such 
as addition, multiplication and subtraction of numbers, respectively. You will learn 
to draw Venn diagrams to show all the possible mathematical or logical relationships 
between sets. Mathematical logic is the analysis of language that helps you to 
identify valid arguments. You will learn about classes of sets, counting principle 
and duality. 


In this unit, you will study about the probability, set theory, conditional 
probability and independence. 


1.1 OBJECTIVES 


After going through this unit, you will be able to: 
e Explain about the probability, conditional probability and independence 
e Describe the set theory 


1.2 PROBABILITY 


Probability theory is also called the theory of chance and can be mathematically 
derived using the standard formulas. A probability is expressed as a real number, 
p e [0, 1] and the probability number is expressed as a percentage (0 per cent 
to 100 per cent) and not as a decimal. For example, a probability of 0.55 is 
expressed as 55 per cent. When we say that the probability is 100 per cent, it 
means that the event is certain while the 0 per cent probability means that the 
event is impossible. We can also express probability of an outcome in the ratio 
format. For example, we have two probabilities, i.e., ‘chance of winning’ (1/4) 
and ‘chance of not winning’ (3/4), then using the mathematical formula of odds, 
we can say, 


‘chance of winning’ : ‘chance of not winning’ = 1/4:3/4=1:3 or 1/3 


We are using the probability in vague terms when we predict something for future. 
For example, we might say it will probably rain tomorrow or it will probably a 
holiday the day after. This is subjective probability to the person predicting, but 
implies that the person believes the probability is greater than 50 per cent. 


Different types of probability theories are: 
(i) Classical Theory of Probability 
(ii) Axiomatic Probability Theory 
(iii) Empirical Probability Theory 


1.2.1 Classical Theory of Probability 


The classical theory of probability is the theory based on the number of favourable 
outcomes and the number of total outcomes. The probability is expressed as a 
ratio of these two numbers. The term ‘favorable’ is not the subjective value given 
to the outcomes, but is rather the classical terminology used to indicate that an 
outcome belongs to a given event of interest. 


Classical Definition of Probability: If the number of outcomes belonging to an 
event E is N, and the total number of outcomes is N, then the probability of event 


; N 
E is defined as Pr = ar. 


For example, a standard pack of cards (without jokers) has 52 cards. If we 
randomly draw a card from the pack, we can imagine about each card as a possible 
outcome. Therefore, there are 52 total outcomes. Calculating all the outcome 
events and their probabilities, we have the following possibilities: 


e Out of the 52 cards, there are 13 clubs. Therefore, if the event of interest is 
drawing a club, there are 13 favourable outcomes, and the probability of 


. 13 1 
this event becomes: —~ = — 


52 4 
e There are 4 kings (one of each suit). The probability of drawing a king is: 
ain 
52 13 


What is the probability of drawing a king or a club? This example is slightly 
more complicated. We cannot simply add together the number of outcomes 
for each event separately (4 + 13 = 17) as this inadvertently counts one of 


; . 16 
the outcomes twice (the king of clubs). The correct answer is: 3D from 


ities 

52 52 52 

We have this from the probability equation, p(club) + p(king) —p(king of 
clubs). 


Classical probability has limitations, because this definition of probability 
implicitly defines all outcomes to be equiprobable and this can be only used 
for conditions such as drawing cards, rolling dice, or pulling balls from urns. 
We cannot calculate the probability where the outcomes are unequal 
probabilities. 


It is not that the classical theory of probability is not useful because of the 
above described limitations. We can use this as an important guiding factor to 
calculate the probability of uncertain situations as mentioned above and to calculate 
the axiomatic approach to probability. 
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Frequency of Occurrence 


This approach to probability is widely used to a wide range of scientific disciplines. 
It is based on the idea that the underlying probability of an event can be measured 
by repeated trials. 


Probability as a Measure of Frequency: Let n, be the number of times 
event A occurs after n trials. We define the probability of event A as, 


P, = Lim “4 
nao n 
It is not possible to conduct an infinite number of trials. However, it usually 
suffices to conduct a large number of trials, where the standard of large depends 
on the probability being measured and how accurate a measurement we need. 


Definition of Probability: The sequence in the limit that will converge to 


the same result every time, or that it will not converge at all. To understand this, let 
us consider an experiment consisting of flipping a coin an infinite number of times. 
We want that the probability of heads must come up. The result may appear as the 
following sequence: 


HTHHTTHHHHTTTTHHHHHHHHTTTTTTTTHHHHHHHHHHHHH 
HHHTTTTTTTTTTTTTTTIT... 


This shows that each run of k heads and ktails are being followed by another 


run of the same probability. For this example, the sequence Fa oscillates between, 


1 2 ; ; 
3 and 3 which does not converge. These sequences may be unlikely, and can 


be right. The definition given above does not express convergence in the required 
way, but it shows some kind of convergence in probability. The problem of 
formulating exactly can be considered using axiomatic probability theory. 


1.2.2 Axiomatic Probability Theory 


The axiomatic probability theory is the most general approach to probability, and 
is used for more difficult problems in probability. We start with a set of axioms, 
which serve to define a probability space. These axioms are not immediately intuitive 
and are developed using the classical probability theory. 


1.2.3 Empirical Probability Theory 


The empirical approach to determining probabilities relies on data from actual 
experiments to determine approximate probabilities instead of the assumption of 
equal likeliness. Probabilities in these experiments are defined as the ratio of the 
frequency of the possibility of an event, AE), to the number of trials in the 
experiment, n, written symbolically as P(E) = f(E)/n. For example, while flipping 


a coin, the empirical probability of heads is the number of heads divided by the 
total number of flips. 


The relationship between these empirical probabilities and the theoretical 
probabilities is suggested by the (Law of Large Numbers). The law states that as 
the number of trials ofan experiment increases, the empirical probability approaches 
the theoretical probability. Hence, if we roll a die a number of times, each number 
would come up approximately 1/6 of the time. The study of empirical probabilities 
is known as statistics. 


1.2.4 Addition Rule 


When two events are mutually exclusive, then the probability that either of the 
events will occur is the sum of their separate probabilities. For example, if you roll 
a single die then the probability that it will come up with a face 5 or face 6, where 
event A refers to face 5 and event B refers to face 6, both events being mutually 
exclusive events, is given by, 
P[A or B] = P[A] + P[B] 
Or P[5 or 6] = P[5] + P[6] 
= 1/6 +1/6 
= 2/6 = 1/3 
P [A or B] is written as P[AU B] and is known as P [A union B]. 
However, if events A and B are not mutually exclusive, then the probability 
of occurrence of either event A or event B or both is equal to the probability 
that event A occurs plus the probability that event B occurs minus the probability 
that events common to both Æ and B occur. 


Symbolically, it can be written as, 
P[AUB] = P[A] + P[B] — P[A and B] 
P[A and B] can also be written as P[4A^ B], known as P [A intersection 
B] or simply P[AB]. 
Events [A and B] consist of all those events which are contained in both 


A and B simultaneously. For example, in an experiment of taking cards out of 
a pack of 52 playing cards, assume that: 


Event A = An ace is drawn. 
Event B = A spade is drawn. 
Event [AB] = An ace of spade is drawn. 
Hence, P[AU B]= P/A] + P/B] — P[AB] 
= 4/52 + 13/52 — 1/52 
= 16/52 = 4/13 


This is because there are 4 aces, 13 cards of spades, including 1 ace of 
spades out of a total of 52 cards in the pack. The logic behind subtracting 
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P[AB] is that the ace of spades is counted twice—once in event A (4 aces) and 
once again in event B (13 cards of spade including the ace). 

Another example for P[A L B], where event A and event B are not mutually 
exclusive is as follows: 


Suppose a survey of 100 persons revealed that 50 persons read India 
Today and 30 persons read Time magazine and 10 of these 100 persons read 
both India Today and Time. Then: 


Event [A] = 50 
Event [B] = 30 
Event [AB] = 10 
Since event [AB] of 10 is included twice, both in event A as well as in 
event B, event [AB] must be subtracted once in order to determine the event 
[AU B] which means that a person reads India Today or Time or both. 
Hence, 


P[AUB] =P [A] + P [B] - P [AB] 
= 50/100 + 30/100 —10/100 
= 70/100 = 0.7 


1.2.5 Multiplication Rule 


Multiplication rule is applied when it is necessary to compute the probability if 
both events A and B will occur at the same time. The multiplication rule is 
different if the two events are independent as against the two events being not 
independent. 

If events A and B are independent events, then the probability that they 
both will occur is the product of their separate probabilities. This is a strict 
condition so that events A and B are independent if, and only if, 


P [AB] = P[A] x P[B] or 
= P[A]P[B] 
For example, if we toss a coin twice, then the probability that the first toss 
results in a head and the second toss results in a tail is given by, 
P [HT] = P[H] x P[T] 
= ]/2 x 1/2 = 1⁄4 
However, if events A and B are not independent, meaning that the probability 
of occurrence of an event is dependent or conditional upon the occurrence or 
non-occurrence of the other event, then the probability that they will both occur 
is given by, 
P[AB] = P[A] x P[B/given the outcome of A] 
This relationship is written as: 
P[AB] = P[A] x P[B/A] = P[A] P[B/A] 


Where P[B/A] means the probability of event B on the condition that event A 
has occurred. As an example, assume that a bowl has 6 black balls and 4 white 
balls. A ball is drawn at random from the bowl. Then a second ball is drawn 
without replacement of the first ball back in the bowl. The probability of the 
second ball being black or white would depend upon the result of the first draw 
as to whether the first ball was black or white. The probability that both these 
balls are black is given by, 

P [two black balls] = P [black on Ist draw] x P [black on 2nd draw/ 
black on Ist draw] 


= 6/10 x 5/9 = 30/90 = 1/3 
This is so because, first there are 6 black balls out of a total of 10, but 


if the first ball drawn is black then we are left with 5 black balls out of a total 
of 9 balls. 


1.2.6 Bayes’ Theorem 


Reverend Thomas Bayes (1702—1761) introduced his theorem on probability 
which is concerned with a method for estimating the probability of causes which 
are responsible for the outcome of an observed effect. Being a religious preacher 
himself, as well as a mathematician, his motivation for the theorem came from his 
desire to prove the existence of God by looking at the evidence of the world that 
God created. He was interested in drawing conclusions about the causes by 
observing the consequences. The theorem contributes to the statistical decision 
theory in revising prior probabilities of outcomes of events based upon the 
observation and analysis of additional information. 


Bayes’ theorem makes use of conditional probability formula where the 
condition can be described in terms of the additional information which would 
result in the revised probability of the outcome of an event. 


Suppose that there are 50 students in our statistics class out of which 20 are 
male students and 30 are female students. Out of the 30 females, 20 are Indian 
students and 10 are foreign students. Out of the 20 male students, 15 are Indians 
and 5 are foreigners, so that out of all the 50 students, 35 are Indians and 15 are 
foreigners. This data can be presented in a tabular form as follows: 


Indian Foreigner Total 
Male 15 5 20 
Female 20 10 30 
Total 35 15 50 


Based upon this information, the probability that a student picked up at 
random will be female is 30/50 or 0.6, since there are 30 females in the total class 
of 50 students. Now, suppose that we are given additional information that the 
person picked up at random is Indian, then what is the probability that this person 
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is a female? This additional information will result in revised probability or posterior 
probability in the sense that it is assigned to the outcome of the event after this 
additional information is made available. 

Since we are interested in the revised probability of picking a female student 
at random provided that we know that the student is Indian. Let A, be the event 
female, A, be the event male and B the event Indian. Then based upon our 
knowledge of conditional probability, Bayes’ theorem can be stated as follows, 


P(A) P(B/ 4) 
P(A) P(B/ A\) + P(A; )(P(B/ Ay) 


P(A, /B)= 


In the example discussed here, there are 2 basic events which are 4, (female) 
and A, (male). However, if there are n basic events, A, A,, .....4,, then Bayes’ 
theorem can be generalized as, 


P(A)P(B/A,) 


P(A,/B) = P(A,)P(B/A,)+ P(A, (P(B/A,)+...+ P(A, )P(B/A, ) 


Solving the case of 2 events we have, 


(30 /50)(20/ 30) 


P(A! D = or s5090 730) + 207/5015720) 


=20/35=4/7=057 

This example shows that while the prior probability of picking up a female 
student is 0.6, the posterior probability becomes 0.57 after the additional 
information that the student is an American is incorporated in the problem. 


Another example of application of Bayes’ theorem is as follows: 


Example 1. A businessman wants to construct a hotel in New Delhi. He generally 
builds three types of hotels. These are 50 rooms, 100 rooms and 150 rooms hotels, 
depending upon the demand for the rooms, which is a function of the area in which 
the hotel is located, and the traffic flow. The demand can be categorized as low, 
medium or high. Depending upon these various demands, the businessman has made 
some preliminary assessment of his net profits and possible losses (in thousands of 
dollars) for these various types of hotels. These pay-offs are shown in the following 
table. 


States of Nature 
Demand for Rooms 
Low(A,) Medium(A,) High(A,) 


0.2 0.5 0.3 Demand Probability 
Number ofRooms R, =(50) 25 35 50 
R,=(100) -10 40 70 
R =(150) -30 20 100 


Solution. The businessman has also assigned ‘prior probabilities’ to the demand 
structure or rooms. These probabilities reflect the initial judgement of the 
businessman based upon his intuition and his degree of belief regarding the outcomes 
of the states of nature. 


Demand for rooms Probability of Demand 
Low (4,) 0.2 
Medium (4,) 0.5 
High (A,) 03 


Based upon these values, the expected pay-offs for various rooms can be 
computed as follows, 


EV (50) = (25 x 0.2) + (35 x 0.5) + (50 x 0.3) = 37.50 

EV (100) = (-10 x 0.2) + (40 x 0.5) + (70 x 0.3) = 39.00 

EV (150) = (30 x 0.2) + (20 x 0.5) + (100 x 0.3) = 34.00 
This gives us the maximum pay-off of $39,000 for building a 100 rooms hotel. 


Now the hotelier must decide whether to gather additional information 
regarding the states of nature, so that these states can be predicted more accurately 
than the preliminary assessment. The basis of such a decision would be the cost of 
obtaining additional information. Ifthis cost is less than the increase in maximum 
expected profit, then such additional information is justified. 


Suppose that the businessman asks a consultant to study the market and 
predict the states of nature more accurately. This study is going to cost the 
businessman $10,000. This cost would be justified if the maximum expected profit 
with the new states of nature is at least $10,000 more than the expected pay-off with 
the prior probabilities. The consultant made some studies and came up with the 
estimates of low demand (X), medium demand (X,), and high demand (X,) with 
a degree of reliability in these estimates. This degree of reliability is expressed as 
conditional probability which is the probability that the consultant’s estimate of low 
demand will be correct and the demand will be actually low. Similarly, there will be 
aconditional probability of the consultant’s estimate of medium demand, when the 
demand is actually low, and so on. These conditional probabilities are expressed in 
Table 1.1. 


Table 1.1 Conditional Probabilities 


xX, X, x, 
States of (4) 0.5 03 02 
Nature (A) 02 06 02 
(Demand) (A) 51 03 06 


The values in the preceding table are conditional probabilities and are 
interpreted as follows: 


The upper north-west value of 0.5 is the probability that the consultant’s 
prediction will be for low demand (X) when the demand is actually low. Similarly, 
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the probability is 0.3 that the consultant’s estimate will be for medium demand 
(X,) when in fact the demand is low, and so on. In other words, P(X,/ 4,)=0.5 
and P(X,/ A,) = 0.3. Similarly, P(X, /A,) = 0.2 and P(X, / A,) = 0.6, and so on. 


Our objective is to obtain posteriors which are computed by taking the 
additional information into consideration. One way to reach this objective 1s to 
first compute the joint probability which is the product of prior probability and 
conditional probability for each state of nature. Joint probabilities as computed is 
given as, 


States Prior Joint Probabilities 
of Nature | Probability} P(4 X) P(A X) P(4 X) 
A, 0.2 0.2 x0.5=0.1 0.2 x 0.3 = 0.06 0.2 x 0.2 =0.04 
A, 0.5 0.5 x0.2=0.1 0.5 x 0.6=0.3 0.5 x0.2=0.1 
A, 03 0.3 x 0.1 =0.03 0.3 x 0.3 =0.09 0.3 x 0.6=0.18 
Total marginal probabilities =0.23 =0.45 =0.32 


Now, the posterior probabilities for each state of nature A. are calculated as 
follows: 


Joint probability of 4; and X, 
Marginal probability of X, 


P(4,/X,) = 


By using this formula, the joint probabilities are converted into posterior 
probabilities and the computed table for these posterior probabilities is given as, 


States of Nature Posterior Probabilities 


P(A/X,) P(A /X,) P(A /X,) 
A, 0.1/.023 =0.435 0.06/0.45 =0.133 0.04/0.32 =0.125 
A, 0.1/.023 =0.435 0.30/0.45 = 0.667 0.1/0.32 =0.312 
A, 0.03/.023 =0.130 0.09/0.45 = 0.200 0.18/0.32 =0.563 
Total =1.0 =1.0 =1.0 


Now, we have to compute the expected pay-offs for each course of action 
with the new posterior probabilities assigned to each state of nature. The net 
profits for each course of action for a given state of nature is the same as before 
and is restated as follows. These net profits are expressed in thousands of dollars. 


Low(A,) Medium (A,) High (A,) 


Number ofRooms (R) 25 35 50 
(R,) -10 40 70 
(R) -30 20 100 


Let O, be the monetary outcome of course of action (7) when (f) is the ee 
corresponding state of nature, so that in the above case O, will be the outcome of a 
course of action R, and state ofnature A, which in our case is $25,000. Similarly, 

O., will be the outcome of action R, and state of nature A,, which in our case is — 

$10,000, and so on. The expected value EV (in thousands of dollars) is calculated NOTES 
on the basis of actual state of nature that prevails as well as the estimate of the 

state of nature as provided by the consultant. These expected values are calculated 

as follows, 


Course of action =R, 
Estimate ofconsultant = X, 
Actual state ofnature = 4, 
where, ¿= 1,2,3 

Then 


(A) Course of action = R, = Build 50 rooms hotel 


ev[&) = 2°(4 Jo, 
X, x, 


0.435(25) + 0.435 (10) + 0.130 (-30) 
10.875 — 4.35 —3.9 = 2.625 


Ev| “| _ ze “lo, 
X, X, 


0.133(25) + 0.667 (10) + 0.200 (-30) 
= 3.325 — 6.67 — 6.0 = -9.345 


zo 4. O, 
Xx; 


= 0.125(25) + 0.312(-10) + 0.563(—30) 
= 3.125- 3.12 — 16.89 
= —16.885 

(B) Course ofaction =R, = Build 100 rooms hotel 


Ev|2| _ z4 lo, 
X, x, 


= 0.435(35) + 0.435 (40) + 0.130 (20) 
15.225 + 17.4 + 2.6 = 35.225 


by 
goes 
i > 
= 
II 
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zp “lo, 
x, 


0.133(35) + 0.667 (40) + 0.200 (20) 
4.655 + 26.68 + 4.0 = 35.335 


Set Theory EV R, 
X, 
NOTES 
EV K 
X; 


ze 4 Jo: 
X; 


0.125(35) + 0.312(40) + 0.563(20) 
4.375 + 12.48 + 11.26 = 28.115 


(C) Course ofaction =R, = Build 150 room-hotel 


a2 


x, 


II 


ze 4 Jo. 
x, 


0.435(50) + 0.435(70) + 0.130 (100) 
21.75 + 30.45 + 13 = 65.2 


2» 4-0, 
p: 


2 


0.133(50) + 0.667 (70) + 0.200 (100) 
6.65 + 46.69 + 20 = 73.34 


sP| “Jo, 
X; 


0.125(50) + 0.312(70) + 0.563(100) 
6.25 + 21.84 + 56.3 = 84.39 


The calculated expected values in thousands of dollars, are presented in a 


tabular form. 


Outcome 


3 


Expected Posterior Pay-Offs 


EV(R/X) EV (RJX) EV(RJX) 
2.625 35.225 65.2 
-9.345 35.335 73.34 
-16.885 28.115 84.39 


This table can now be analysed in the following manner. 


If the outcome is X, it is desirable to build 150 rooms hotel, since the 
expected pay-off for this course of action is maximum of $65,200. Similarly, if the 
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outcome is X,, the course of action should again be R, since the maximum pay-off Probability and 


. A : i 7 R Set Theor 
is $73,34. Finally, ifthe outcome is X,, the maximum pay-offis $84,390 for course S 
ofaction R,. 

Accordingly, given these conditions and the pay-off, it would be advisable NOTES 


to build a hotel which has 150 rooms. 


1.2.7 Sample Space and Events 
Sample Space 


A sample space is the collection of all possible events or outcomes of an 
experiment. For example, there are two possible outcomes of a toss of a fair 
coin, which are a head and a tail. Then, the sample space for this experiment 
denoted by S would be: 


S= [H, T] 
This makes the probability of the sample space equals 1 or, 
PIS] = P[Ħ,T] =1 


This is so because in the toss of the coin, either a head or a tail must occur. 
Similarly, when we roll a die, any of the six faces can come as a result of the 
roll, since there are a total of six faces. Hence, the sample space is S = [1, 2, 
3, 4, 5, 6], and P[S] = 1, since one of the six faces must occur. 


Events 


An event is an outcome or a set of outcomes of an activity or a result of a trial. 
For example, getting two heads in the trial of tossing three fair coins simultaneously 
would be an event. The following are the various types of events: 


Elementary Event. An elementary event, also known as a simple event, is a 
single possible outcome of an experiment. For example, if we toss a fair coin, then 
the event of a head coming up is an elementary event. Ifthe symbol for an elementary 
event is (E), then the probability of the event (£) is written as P[E]. 


Joint Event. A joint event, also known as compound event, has two or 
more elementary events in it. For example, drawing a black ace froma pack of 
cards would be a joint event, since it contains two elementary events of black and 
ace. 

Mutually Exclusive Events. Two events are said to be mutually exclusive 
if the occurrence of one precludes the occurrence of another. 


All simple events are mutually exclusive. Compound events are mutually 
exclusive only when they contain no simple event in common. Thus, in drawing of 
a card from a normal pack, the events ‘a red card’ and ‘a spade’ are mutually 
exclusive as a card canot be red and a spade at the same time. But ‘a heart’ and ‘a 
king’ are not mutually exclusive as both the compound events include the elementary 
event “King of Heart’. 
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Complementary Event. Two mutually exclusive events are said to be 
complementary if they between themselves exhaust all possible outcomes. Thus, 
‘not Ace of Heart’ and ‘Ace of Heart’ are complementary events. So also are ‘no 
head’ and ‘at least one head’ in repeated flipping of a coin. If one of them is 


denoted by A, the other (complementary) event is denoted by 4. 


How is an Event Represented? It is a statement about one or more outcomes 
of an experiment. For example, ‘a number greater than 4 appears’ is an event for 
the experiment of throwing a dice. 

Consider an experiment of drawing a card from a pack containing just four 
cards: Ace of Spades, Ace of Hearts, King of Spades and King of Hearts. We 
draw any one of these four cards. So there are four possible outcomes or events— 
drawing of SA, HA, SK, or HK. These events are called simple events because 
they cannot be decomposed further into two or more events. 


Any set of simple events can be represented on diagram like Figure 1.1 The 
collection of all possible simple events in an experiment is called a sample space 
or a possibility space. Thus the sample space of drawing a card from the pack 
described earlier consists of four points. 


° ° 
HEART HEART 
ACE KING 
o ° 
SPADE SPADE 
ACE KING 


Fig. 1.1 Simple Events 


Aneventis termed compoundifitrepresents two ormore simple events. Thus, 
the event ‘Spade’ isacompound event as itrepresents two simple events SA and 
SK (Refer Figure 1.2). 


e e 
HA HK 
w D 

SA SK! 
~~ Compound Event — 
‘Spade’ 


Fig. 1.2 Compound Event 


Similarly, ‘not HA’ is also a compound event made up of all event except HA, 
that is made up of SA, SK and HK (Refer Figure 1.3). 


Compound Event 
not HA 


HA 


Fig. 1.3 Compound Event 


The sample may be discrete or continuous. If we are dealing with discrete 
variable, the sample space is discrete and if we are dealing with continuous variables 
itis continuous. The sample space for rolling of two dices is a discrete one consisting 
of 36 points (Refer Figure 1.4) and that for the weights of individuals selected at 
random would be continuous. 


OaaeH ss Io a uh ge ee al i 
i I [j 1 1 1 
i i i i i i 
Geese ee ee eee eee eee 
I i i 1 1 [i 
i i i 1 1 [i 
i i i i i i 
4 4----- onan t----4----- t----t----- 1 
i i I 1 1 1 
2 i i i i i i 
I J 1 1 1 I 
e O ma eae a 
5 tf S 2 4 4 
g 1 1 1 I 1 1 
pees ieee eee eee en eee 
i i I 1 1 [i 
[i i I 1 1 [i 
I i [i 1 1 [i 
if i [i 1 1 1 
fy eae mc rr Weed Teens coun 
I I [i 1 1 1 
[i I [i 1 1 [i 
i i i 1 1 [i 
L 1 1 1 1 L > 
1 2 3 4 5 6 
First Die 


Fig. 1.4 Discrete and Continuous Sample Space 


Sum of Events. Sum of two events A, and A, is the compound event ‘either 
A, or A, or both’, i.e., at least one of A, and 4, occurs. This is denoted by A, + 
A,. In general, A, +A, +... + A, is the event which means the occurrence of at 
least one of A,’s. 

Product of Events. Product of two events 4,A, is the compound event ‘A, 
and 4, both occur’. This is denoted by 4 ,4,. Obviously, if A, and A, are two 
mutually exclusive events than A, A, is impossible event. 

Suppose a person is required to calculate the possibility of the occurence of 
one outcome (simple or compound) of an experiment. One method to do this is to 
try the experiment a large number of times under exactly similar circumstances. If 
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an outcome occurs m times in 7 trials, m/n is called is relative frequency. It is 
conventional to use the term success whenever the event under consideration 
takes place and failure whenever it does not. 

If the outcome of the experiment be represented by graph in which we have 
the total number of trials n on the horizontal axis and the proportion of successes 
m/n on the vertical axis, we note the following points: 

1. When n is small, the ratio m/n fluctuates considerably. 
2. When n becomes large, the ratio m/n becomes stable and tends to settle 
down to a certain value, say P. 

From these, we conclude that when an experiment is repeated a large number 
of times, the proportion of times the event occurs would be practically equal to the 
number P. 

We call the number P the probability of occurrence of the given event. 

Thus, when we talk of the probability of an event, we simply refer to the 
proportion of times that event occurs in a large number of trials or in a long run. 
This is called the relative frequency approach of defining probability. 

So the probability of getting a six in a single rolling ofa die is the proportion of 
times a six would show up in a large number of rollings of a single die under 
exactly similar circumstances. 

Note carefully that P and the proportion of success m/n are not the same 
things. The ratio m/n changes with n while P does not. It is a fixed number. However, 
when n is large and P is not known, m/n is taken as an estimate of P. 


1.2.8 Finite Probability Spaces 


A probability space is a measure of space such that the measure of the whole 
space is equal to 1. A simple finite probability space is an ordered pair (S, p) such 
that Sis set and p is a function with domain S. The range is a subset of [0,1] such 
that, 


x p(s)= 1 

seS 
Suppose (S, p) be a simple finite probability space. Then, 
A={A:ACS} 


Let, P(A)=}, p(s)=1 for Ae A 
sEÁ 
It can be easily verified that, (S, A,P) is a probability space. 


A simple and frequently used function p is obtained by letting p(s) equal one over 
the number of elements of S for each se S. 


Definition: A finite probability space is a finite set Q #0 together with a function, 
Pr: Q — R* such that, 


(i) yoe Q, Pr(@) >0 
(ii) 2 Pr(@) = 1 


Here, the set Q is the sample space and the function Pris the probability distribution. 
The Elements we Q are called atomic events or elementary events. An event is a 
subset of Q. The uniform distribution over the sample space is defined by setting 
Pr(@) = 1/ Q| for every we Q. This distribution defines the uniform probability 
space over Q. Ina uniform space, calculation of probabilities amounts to counting: 
Pr(4)=|Al/ | Ql. 


Check Your Progress 


. Define the terms simple probability and joint probability. 
. Explain the classical theory of probability. 

. What is the addition rule? 

. When is the law of multiplication applied? 

. What is Bayes' theorem? 


Nn BW NO 


. What is a mutually exclusive event? 


1.3 SET THEORY 


Sets are one of the most fundamental concepts in mathematics. A set is a collection 
of distinct object considered as a whole. Thus we say, 


‘A setis any collection of objects such that given an object, it is possible to 
determine whether that object belongs to the given collection or not.’ 

The members of a set are called elements. We use capital letters to denote 
sets and small letters to denote elements. We always use { } brackets to denote a 
set. 

Examples of Sets: (i) The set of all integers. 
(ii) The set of all students of Delhi University. 
(iii) The set of all letters of the alphabet. 
(iv) The set of even integers 2, 4, 6, 8. 


Example 2. Let M be the collection of all those men (only those men) in a village 
who do not shave themselves. Given that, (7) All men in the village must be clean 
shaven, (ii) The village barber shaves all those men who do not shave themselves. 


Solution. Suppose b denotes the village barber. Ifb € M, then b does not shave 
himself. Then as per given statement (ii), b shaves himself, is a contradiction. 

If b é M, then b shaves himself. Then as per the given statement (i), b does 
not shave himself, again a contradiction. 
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Since we cannot answer ‘Yes’ or ‘No’ to the question, ‘Is barber himself 
a member of M ?’ We conclude that M is not a set. 


Elements 


The members ofa set are called its elements. We use capital letters to denote sets 
and small letters to denote elements. Ifa is an element of the set A, we write it as, 
a€ A (read as ‘a belongs to A’) and if a is not an element of the set 4, we write 
it as, a ¢ A (read as ‘a does not belong to A’). There are different ways of 
describing a set. For example, the set consisting of elements 1, 2,3, 4, 5 could be 
written as {1, 2,3, 4,5} or {1,2, ...,5}or {x|xe N,x <5}, 

Where, N = Set of natural numbers. 


We always use { } brackets to denote a set. A set which has finite number 
of elements is called a finite set, else it is called an infinite set. For example, if A 
is the set of all integers, then A is an infinite set denoted by {. ..,—2,— 1,0, 1, 2, 
...} or {x| xis an integer}. 


Singleton 


A set having only one element is called singleton. Ifa is the element of the singleton 
A, then A is denoted by A = {a}. Note that {a} and a do not mean the same; {a} 
stands for the set consisting of a single element a, while a is just the element of 
{a}. Itis the simplest example of a nonempty set. 


Equality of Sets 


Two sets A and B are said to be equal if every member of A is amember of B and 
every member of B is amember of A. We express this by writing A =B, logically 
speaking A = B means (x € A)=(x€ B) or the biconditional statement (x € A) 
& (x€ B) is true for all x. 


Notes: 1.The order of appearance of the elements ofa set is of no consequence. 
For example, the set {1, 2,3} is sameas {2, 3, 1} or {3, 2, 1}, etc. 

2. Each element ofa set is written only once. For example, {2,2,3} is 

not a proper way of writing a set and it should be written as {2, 3}. 


Universal Set 

Whenever we talk ofa set, we shall assume it to be a subset of a fixed set U. This 
fixed set Uis called the universal set. 

Subsets 


Let A and B be two sets. If every element of A is an element of B, then A is called 
a subset of B and we write A c B or BD A (read as ‘A is contained in B’ or ‘B 
contains A’). 


Logically speaking, A c B means (xm € A) > (x € B) is true for every x. 


Notes:1.1fA CB and A#B, we write A C B or BD A (read as: A is a proper Probability and 


subset of B or B is a proper superset of A). nea 
2. Every set is a subset and a superset of itself. 
3. If A is not a subset of B, we write A ZB. NOTES 


Empty Set or Null Set 


A set which has no element is called the null set or empty set. It is denoted by the 
symbol @. 
For example, each of the following is a null set: 
(i) The set of all real numbers whose square is —1. 
(ii) The set of all those integers that are both even and odd. 
(iii) The set of all rational numbers whose square is 2. 
(iv) The set of all those integers x that satisfy the equation 2x = 5. 


Example 3. The empty set @ is a subset of every set. 


Solution. Suppose @ is not a subset of the set A. This means there exists a € @ 
such that a ¢ A. This is impossible as @has no element. So, @is a subset of every 
set. 


Aliter. Logically speaking, this can be proved that the conditional statement 
(xe 6) => (xe A) is true for every x. Since ¢ has no element, the statement ‘x € 
@ is false. Hence, the conditional statement (x € @) > (x € A) is true, which 
proves the result. 


Example 4. List the following sets (here N denotes the set of natural numbers and 
Z, the set of integers). 
(7) {x |x € Nand x < 10} 
(ii) {x |x € Zand x < 6} 
(iii) {x |x Ee Zand 2<x< 10} 
Solution. (7) We have to find the natural numbers which are less than 10. They are 
1,2, 3,4, 5, 6, 7, 8, 9. The set can be described as {1, 2, 3, 4,5, 6, 7, 8, 9}. 
(ii) We have to find integers which are less than 6. They are all negative 
integers and the integers 0, 1, 2, 3, 4, 5. The set may be described as, 
{...,—3,—-2,—1, 0, 1, 2, 3, 4, 5}. 
(iii) We have to find integers that are between 2 and 10. They are 3, 4, 5, 
6, 7, 8, 9. The set may be described as {3, 4, 5, 6, 7, 8, 9}. 
Example 5. Give the verbal translation of the following sets: 
(i) {2, 4, 6, 8} 
(i) {1,3,5,7,9,...} 
(iii) {-1, 1} 
Solution. (i) It consists of all positive even integers less than 10. 
(ii) It consists of all positive odd integers. 
(iii) It consists of those integers x which satisfy x — 1 =0. 


Self-Instructional 
Material 


19 


Probability and 
Set Theory 


20 


NOTES 


Self-Instructional 
Material 


Example 6. If a, #5, and {a,, b,} = {a,, bn} then show that a, #b,. 
Solution. Let a, =b,. Then a, € {a,,5,} means a, E€ { ay, ba} = {a}. Soa, 
=a,. Also b, € {a,,b,} means b, € (a, b2) = {ay}. So, b, =a,. Therefore, a, 
= b,, which is wrong. Thus a, #b,. 
Example 7. IfA cBandBCC,thenACC. 
Solution. Leta e A be any element of A. Thenas A CB,ae B. 

AlsoBCC>aeéC. 

Thus every element of A belongs to C> ACC. 

Aliter. Logically speaking, we want to prove that, 
(xe 4> xe B]a lee DBE OC] >S[WeE A> (xe OJ 

is true for every x. This follows the Transitive Law. 
Example 8. If 4 cB and BCA, then A =B. 
Solution. Since4 CB, everyelement ofA isanelementofB. Also B CA, means 
every element of Bis also an elementof4A. This provesA = B. 

Aliter. Logically speaking, this can be proved as, 

(xe A> we BJ aAlL@e B) > xe AI > [xe A) (xe B)J is 
true for every x. In other words, [p > q) A (¢ > p)] > (p € q) is true. Since 
p= qis true and q > pis true, (p > q) ^ (q = p) is also true. This also means 
that p = q is true. So, [(p > q) ^ (q4 => p)| > (p € q) is true. This proves the 
result. 


Example 9. If4 c B and B c C, then A c C. 


Solution. IfA = C, then every element of B is also an element of A (as B CA). 
But A c B means every element of A is also an element of B. Combining these 
facts, we get A = B is a contradiction (as A is a proper subset of B). So, A + C. 
Clearly, every element of A is also an element of C. Therefore, A is a proper 
subset of C. 


Aliter. If A= C, then B CA. This means (x € B) > (x € A) is true for every 


Also A c B means (x € A) > (x € B) is true for all x. Therefore, 
(x€ A) & (x € B) is true for every x. So, A = B, is not possible as A is 
proper subset of B. Hence, A + C. A is subset of C. 


Example 10. Find all possible solutions for x and y for each of the following 
cases: 
(i) {2x, y} = {4, 6} 

(ii) {x, 2y} = {1, 2} 

(iii) {2x} = {0} 
Solution. (7) Let A = {2x, y} and B= {4, 6} 

Now 2x e A means 2x € B. So, 2x =4 or 2x = 6. If 2x = 4, then x = 2. Also 
ye Ameans ye B. So, y=4 ory = 6. However, y cannot be equal to 4. Then A 
will have only element 4 while B will have elements 4 and 6. Therefore, one solution 


isx=2 andy=6. If2x=6, thenx=3. Buty cannot be 6. Then 4 will have only Probability and 
element 6. Therefore,y must be 4. Another solution is x = 3 and y=4. cee 
(ii) Let, A = {x, 2y} and B= {1, 2} 
xe Ameansxe B 
NOTES 


So,x=lorx=2 
Ifx = 1, then 2y = 2. So, one solution is x= 1 and y= 1 


If x = 2, then 2y = 1. So, another solution is x = 2 and y= 


Ne 


(iii) Let, A = {2x}, B= {0} 

2x € A means 2x e B 

So, 2x = 0 which means x = 0 
Therefore, the only solution is x = 0. 


Example 11. Find at least one set A such that, 

(i) {1,2} CA c {1,2,3,4} 

(i) {0,1,2} CAC {2,3,0, 1,4} 
Solution. (i) Since {1, 2} c A, it means A must have 1, 2 as its elements and 
also some other members. Again, A C {1, 2, 3, 4} means that the extra 
member should be 3 or 4. So A = {1, 2, 3} or A = {1, 2, 4}. 


(ii) Considering the solution of above case (i), there are two possibilities. 
EitherA = {0, 1, 2,3} or A = {0, 1, 2, 4}. 


Venn Diagrams 


Venn diagrams are used to illustrate various set operations. It is named after John 
Venn (1834-1883 ). You can represent the universal set by the points in and ona 
rectangle and subsets A, B, C, . . .by points in and on the circles or ellipses drawn 
inside the rectangle. In Figure 1.5, the shaded portion represents A A B. 


Fig. 1.5 


In Figure 1.6, the shaded portion represents A U B. 


Fig. 1.6 
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In Figure 1.7, the shaded portion represents A”. 


A' U 
Fig. 1.7 


In Figure 1.8, the shaded portion represents A — B. 


Fig. 1.8 


In Figure 1.9, three sets A, B, C divide the universal set U into 8 parts. Eighth 
is part not numbered in the Venn diagram. 


Fig. 1.9 


Example 12. Prove that 4 U (B A C)=(4A UB) A (AU C) using Venn diagrams. 
Solution. This can be proved with reference to Figure 1.9. B > Cis represented 
by areas 4 and 7, and A is represented by areas 1, 2,6 and 7. So, A U (BA ©) 
is represented by areas 1, 2, 4, 6 and 7. Again, areas 1, 2, 4, 5, 6, 7 represent A 
U Bandareas 1, 2,3, 4, 6, 7 represent A U C. So, areas 1, 2, 4, 6, 7 represents 
(A UB) (AUC). This proves our assertion. 


Example 13. Using Venn diagrams show that A — (B U C) =(A-B)U (4 - 
C). 

Solution. To prove this see Figure 1.5, Areas 2, 3, 4, 5, 6, 7 represent B U C. 
Therefore area | represents A — (B U C). Now areas, 1, 2 represent A — B and 
areas 1, 6 represent A — C and area | represents (A — B) A (4 — C). This proves 
the result. 


Example 14. Using Venn diagrams show that for any two sets A and B, 
(AQ BY=A’ UB’ 
Solution. In Figure below, area 1 represents A A B while areas 2, 3, 4 represent 


(4A BY. Again areas 3, 4 represent A’ and areas 2, 4 represent B’. Therefore 
areas 2,3, 4 represent A’ U B’. 


Probability and 
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Example 15. Use Venn diagrams to show that for any sets A and B, 

AUB =AU(B-A) 
Solution. Refer Figure given in Example 14. Areas 1, 2, 3 represent AUB. Also 
areas 1, 2 represent A and area 3 represents B — A. So, areas 1, 2, 3 represent 
A U (B — A). This proves the result. 


Operations with Sets 


The reader is familiar with the operations of addition and multiplication in Arithmetic. 
For any two given numbers, the operations of addition and multiplication associate 
another number which is called sum or product of two numbers respectively. In 
this section, we will define three operations for associating any two given sets as a 
third set. These three operations namely, union, intersection and complement, 
analogous to the operations of addition, multiplications and subtraction of numbers 
respectively. 


Union 
The union of any two sets A and B is the set of all those elements x such that x 
belongs to at least one of the two sets A and B. It is denoted by A UB. Logically 


speaking, if the biconditional statement (x€ C) <= (xe A) v (x€ B)is true for all 
x, then C= A UB. In other words (x e A UB)= (xe A) Vv (xe B). 


Example 16. Prove that for any sets A and B (i) ACA UB, (ii) BCA UB. 
Solution. (ij) x € A means x € A UB, by definition. Therefore, A CA UB. 

(ii)x e B means x € A UB, by definition. Therefore, B cA UB. 

Aliter. (i) We want to prove that the conditional statement, 

(xe A) > (xe AUB) is true 

But this statement is false if (x € A) is true and (x € A UB) is false. Such a 
situation cannot occur, therefore for (x € A) is true means that (x € A) v (x€ B) 
is true. Hence, (x € A) v (x€ B) is true and (x € A U B) is false. It means (x € 
A) v (xe B) >(x € A UB) is false. This is impossible by definition of A U B. 
Similarly, we can prove case (ii). 
Example 17. IfA CB, then A U B =B and conversely, if A U B =B, then A CB. 


Solution. Suppose A CB. Letxe AUB. Thenxe A, orxe Borxe (A and 
B). Ifxe€ A, then x€ B (as A CB). In any case, x E AU B means x € B. So, A 
U B cB. We have already proved A CA UB. Therefore, A U B =B. Conversely, 
let AU B=B. Let x e A. Then x e AUB, which means x € B. Hence, A c B. 


NOTES 
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Aliter. Suppose A c B. This can be proved that the biconditional statement 
(xe B) & (xe A) Vv (x€ B)is true for every x. But this statement is false if and 
only if(x € B) 1s false and (x € A) is true. Such a situation cannot occur as A CB. 

This proves AU B=B. 

Conversely, if A U B =B, then we want to show that the conditional statement 
(xe A)=> (xe B)is true for every x. This is false ifand only if (x € A) is true and 
(xe B) is false. Now (xe A) is true means (x € A) v (x € B) is true. Therefore, 
(xe A) v (xe B)= (xe B)is false. This is impossible as B= A U B. This proves 
AGB. 


Example 18. IfA c Cand B cC, then (4 UB) EC. 
Solution. We want to show that (x€ A U B) => (x€ C) is true for every x. This 
is equivalent to say that (x € A U B) is true and (x € C) is false cannot occur 
together. Suppose (x € AU B) is true. Then (x € A) v (x€ B)is true. This means 
(x € A) is true or (x€ B) is true. If (x € A) is true then (x € C) is true as A CC. 
If (x€ B) is true then (x € C) is true as B CC. In any case (x € C) is true. So, 
when (x€ AU B) is true, (x € C) should also be true. This proves our assertion. 
Aliter. Let x € AU B. This meansx € A orxe Borxe (A and B). Ifx 
e A, thenxe C(asA CC). Ifxe B, thenx € C(asB CC). Inany case, x € 
C. So, x€ A U B means xe C. 
This proves A UBCC. 


Intersection 
The intersection of two sets A and B is the set ofall those elements x such that x 
belongs to both A and B and is denoted by 4 A B. If A A B = @, then A and B are 
said to be disjoint. 
Logically speaking, if the biconditional statement (xe C) & (xe A)a (xe 
B) is true for all x, then C=4 A B. Hence, 
(xE ANB)=(xE€ ADO (rE B) 


Example 19. Show that for any sets A and B(i) AN BCA (ii) ANBCB 
Solution. Let x € A A B. Then, by definition x € A and x € B. Therefore, 
ANBCAandANBCB. 

Aliter. (i) This can be proved that, 

(xe ANB)=> (xe A)is true for all x. 

Now, consider the case when (x € A A B) is true and (x € A) is false. Here, 
(x € A) is false means (x € A) A (x€ B) is false and so (x € AM B) > (xe A) 
(xe B) is also false which is impossible by definition of (4 A B). This proves the 
result. 

(ii) This can be proved that, 

(xe AAB) => (xe B)is true for all x. 

The only doubtful case is when (x € A A B) is true and (x€ B) is false. This 

is not possible according to definition of A A B. Hence proved. 


Example 20. If A cB and 4 CC, then 4 C(BNC) Probability and 


Set Theor 
Solution. Letx € A. Thenxe Bandxe C(asA CBandA CC). i 
So, xe BAC. 
This proves that Ac B AC. NOTES 


Aliter. This can be proved that, 
(xe A) > (xe BA C)istrue for all x. 
The only doubtful case is when (x € A) is true and (x € B A C)is false. 
Now (x € A) is true means (x € B) is also true (as A CB). Also (x€ C) is true 
(as A CC). This means (x € B) A (x€ C) is true and therefore (x € BA C) is 
true. This proves the result. 
Example 21. 4UB =4ABifand onlyif A =B. 


Solution. Suppose 4 U B =4 A B. Letxe A. Thenxe AUBandsoxe AQB. 
Therefore, x € B. This proves that A CB. Similarly B CA and hence A =B. 


Aliter. Suppose AUB=AQB 
According to Adsorption law, 
(x€ A) =(xeE ADN[E Auxe B)] 
=(xxe A)N[xeE AUB 
=(xxe A)N[xE ANB 
=[(xe A)N (eE AION (We B) 
=(xxe A)A (xe B) 
=(xeE ANB) 
=(xe AUB) 
=(xe B)U(re A) 
=(xe B)U[(xe A) U(re A] 
=(xe B)U[xe BUA] 
=(xe B)U[xe ANB] 
=(xe B)U[weE ADA (Ee B)] 
=[x~e B)U(xeE A)N (xe B) 
=(x € B) Adsorption law 
This proves that A = B. 
Conversely, if A = B, then 
(xe AUB)=(xeE ADU(E B) 
=(xe B)U(xe B) 
=(xe B) 
=(xe B) (xe B) 
=(xeE A)N (ve B) 


=(xe ANB) 
Note: Adsorption law in logic means that, 
(i) p @) (p U r) =p (ii) p U (p (a) r) =p Self-Instructional 
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Complements 


If A and B are two sets then complement of B relative to A is the set of all those 
elements x € A such that x ¢ B and is denoted by A — B. Logically speaking, if for 
a set C the biconditional statement (x e C) & (xe A) A(x B) is true for all x, 
then 

C=A-B. In other words, if (x € C)=(x€ A) A (x B) then C is called the 
complement of B relative to A. 


Notes:1.Itis proved from the above definition that A — B is a subset of A. 
2. Whenever we say complement of B we mean complement of B 
relative to the universal set U. In such cases, we denote complement 
of Bby B’. 
So, B’=U-B. 
Example 22. Show that A- B =4 A B’. 
Solution. Let x € A — B. This means x € A and x ¢ B. By definition of the 
universal set Æ — B c U. So, x€ U. Thereforexe U,x ¢ B, implies that x€ B’. 
This proves that A — B CA A B’. Againifx € AA B’, then xe Aandxe B’. 
Now x€ B’ implies that x ¢ B. Sox € A-—B. This proves that A O B’C A-B. 


Therefore A-B=A OB’. 

Aliter.(x €¢ A —B)=(x€ A)A(xg B) 
HEQXEANUA(XE B)asANU=A 
=[(xre ANE UO (xe B) 
=[(xre AA[(xre NO (xe B) 
=[(re A) A (xe B’)] 

This proves that 4- B=4 OB’. 


Example 23. Prove that A c B if and only if B’ cA’. 
Solution. Suppose A c B. Let x c B’. Then xe U and x ¢ B. Nowx ¢ B 
implies that x ¢ A (as A c B). Thereforexe U and x ¢ A implies thatxe A’. 
This proves that B’ c A’. Conversely, let B’ CA’. Let B e A. Then x A’. Now 
x ¢ A’ implies thatx ¢ B’ (as B’ c A’). This means that x € B. Hence, A CB. 
Aliter. Now (xe A)>(xe B) 

=~(xe B)>~(xe A) (by Contrapositive law in logic) 
(x ¢é B) > (x € A) 

= (xe B’) > (xe A’) 

Suppose 4 CB. Then (x€ A) > (xe B)is true for all x. This is proved that, 
(xe B’) > (xe A’) is true for all x. This means B’ c A’. Conversely, suppose 
B’CA’. Then (x € B’) > (xe A’) is true for all x. This is proved that (x € A) > 
(x € B) is true for all x. This implies that A CB. Hence proved. 


Algebra of Sets Probability and 
Set Theory 
The following are some of the important laws of sets. 
1. Law of Idempotence. For any set A, 
AUA=AandANA=A NOTES 


2. Commutative Law. For any sets A and B, 
AUB=BUA,ANB=BOA 
3. Associative Law. For any three sets A, B, C, 
@AU(BUC)=(AVUB)UC 
(i) AN(BAQC)=(ANB)AC 
Proof. (i) It can be proved that, 
[xe AU(BUOC)] & [xE (A UB) CC] is true for all x. 
Now by definition, 
[xe AU(BUO)]=[MeE AU {xe DUE O] 
and [xe (AUB)UC]=[{@eE A) Ue B)} UWE 0] 
Hence, as per the associative law in logic, the result of case (i) follows. 
Similarly, you can prove case (ii). The proof (ii) is left as an exercise. 


Distributive Laws 
For any three sets A, B, C, 
MAN(BUC)=(ANB)UANC) 
(i) AU(BNC)=(AUB)N(AUC) 
Proof. (i) Letx e AM (BUC). This implies that x € A andxe BUC. Now, 


xe BUCimplies thatxe Borxe Corxe both Band C. Ifxe B, then x € 
AB. Ifx € C,thenxe ANC. Inanycasexe (ANB)U (ANC). 


So, AN(BUC)=(ANB)U(ANC). 

Similarly (4 AB) U(ANC)=A A (B UC). This proves case (i). 

Similarly, we can prove case (ii). 

Aliter.x € [AABY O]= [ke Aae BUC) 
=[@eA)N (re B) Vee O) 
=[~eE ANA (Ee B) V[WE A) CE CO) 

(by Distributive Law of logic) 
=[xeE (ANB) U[xEe ANO)] 
=[xE (ANB)UANO] 

So, AA(BUQC)=(ANB)U(ANC) 

Similarly, we can prove case (ii) by laws of logic. 
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De Morgan’s Laws 
For any two sets A and B, 
(i) (AUBY =A’ AB’ 
(ii) (AN BY =A UB’ 
Proof. (i) Letx € (4 U BY. This implies that x ¢ A U B and x e U. Now 
x € AU B implies that x ¢ A and x¢ B. Butx ¢ A andxe U implies that x € 
A’ andx ¢ Bandxe U implies that x e B’. Therefore, xe A’ ^ B’ and so (A 
U BY =A’OB". Similarly (4^ B’)=(AUBY. 
This proves that (4 U BY = A’ A B’ 
Alternative proof of case (i) using logic: 
Now xe (A UBY =~[(xe (4 UB)] 
=~[(xe A)U(xeE B)| 
=~(xeE A)N~ (re B) 
=(x¢A)O(x¢ B) 
=(xeE A) A (xe B’) 
=(xe A’ B’) 
Therefore, (4 U BY =A’ OB’ 
The proof of case (ii) is left as an exercise. 


Example 24. Let A, B, C be any three sets. Prove that, 
AN(B-C)=(ANB)-(ANC) 
Solution. (4 0 B)-—(A NC) =(ANB)A(ANCY 
=(ANB)A(A UC) 
by De Morgan’s law 
=[(ANB) NAL [(A NB) AC’) by Distributive law 
=[(AN 4’) OB] U[(ANB) OC] byAssociative law 


=[@A B]U[AN(BNC)] 
=PU[AN(BAC)] 
=AN(BNC) 
=A -(B -C) 


Example 25. For any sets A and B, show that, 
(A—B) U(B-A)=(AUB)-(ANB) 
Solution. (4 U B)- (4AB) =(AUB)A(ANBY 
=(AUB)A(4’ UB’) 
By De Morgan’s law, 
=[(AUB)NA]TU[AUB) OB] 
By Distributive law, 
=[ANAJUL(BONA)VU[ANB)VU(BOB)] 
=[PUBNAYV[ANB) UG] 


= (B A’) U (A NB’) Ee and 
=(B—A)U(A-B) i 
=(A4-B)U (B-A) 


By Commutative law, NOTES 


Finite Sets 

If A is a finite set, then we shall denote the number of elements in A by n(4). IfA 

and B are two finite sets, then it is very clear from the Venn diagram of A —B that, 
n(A — B) =n(A)—n(BO A) 

Suppose A and B are two finite sets such that A NA B = @. Then, the number of 


elements in A U B is the sum of number of elements in 4 and B the number of 
elements in B. 


i.e., n(A U B)=n(A)+n(B) if ANB=O 
For example, to find the number of elements in A U B, incase A A B + @, can be 
proved as follows: 
For any two sets A and B, 
AUB=AU(B-A) 
Here, A OA(B-A)=6 
Therefore, n(A U B) = n(A) + n(B— A) 
= n(A)+n(B)-n(A OB) 
Note: According to the definition of empty set it follows that n(@) = 0. 
Therefore, if A and B are two finite sets, then, 
n(A U B)=n(A) +n (B)—n(A OB) 
Similarly, if A, B, C are three finite sets, then, 
nNAVUBUC)=H=n(4A VB) +n(C)—n[ (A UB) AC] 
=n(A)+n(B) -n(A A B)+n(C)—-n[(4 VB) AC] 
=n(A)+n(B)+n(C)—n(ANB)-ar[(A AC)U(BNC)] 
=n(A) + n(B) + n(C)—n(4 A B)- [nA O] 
+n(BAC)—-n[(AAC)aA(BAC)] 
=n(A) + n(B)+n(C)—n(4 A B)-n(ANC)-n(BAC) 
+n(ANBAC)aANCNABOACH=ANBNC 
These two results are used in the following problems. 


Example 26. Ina recent survey of 400 students in a school, 100 were listed as 
smokers and 150 as chewers of gum; 75 were listed as both smokers and gum 
chewers. Find out how many students are neither smokers nor gum chewers. 
Solution. Let Ube the set of students questioned. Let A be the set of smokers, 
and B the set of gum chewers. 

Then, n(U) = 400, n(A) = 100, n(B) = 150, n(4 A B) = 75 

We have to find out n(4’ A B’) 
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Now, A’ AB =(AUBY=U-(4 UB) 
Therefore, 
n(d OB’) =n [U-(4 VB)] 
=n(U)—n[(A UB) VU] 
=n(U)—n(A U B) 
=n(U) —n(A) —n(B) + n(A AB) 
= 400 — 100 — 150 + 75 
= 225 


Example 27. Out of 500 car owners investigated, 400 owned Fiat cars and 200 
owned Ambassador cars; 50 owned both Fiat and Ambassador cars. Is this data 
correct? 


Solution. Let Ube the set of car owners investigated. Let A be the set of those 
persons who own Fiat cars and B the set of persons who own Ambassador cars; 
then A AB is the set of persons who own both Fiat and Ambassador cars. 


n(U) = 500, n(A) = 400, n(B) = 200, n(A A B) = 50 
Therefore, n(A UB) =n(A) + n(B)—-1n(4 AB) 
= 400 + 200 — 50 = 550 
This exceeds the total number of car owners investigated. 
So, the given data is not correct. 
Example 28. A market research group conducted a survey of 1000 consumers 


and reported that 720 consumers liked product A and 450 consumers liked product 
B. What is the least number that must have liked both the products? 


Solution. Let, U = Set ofconsumers questioned 
S = Set of consumers who liked product A 
T =Set of consumers who liked product B 


Then, S A T =Set of consumers who liked both the products A and B 
Now, n(U) = 1000, n(S) = 720, n(T) = 450 
Therefore, n(SU T) = n(S)+n(T)—n(S'A T) 
= 1170-n(SOT) 
So, nS OT) = 1170 —n(S U T) 


Now, a(S A 7) is least when n(S U 7) is maximum. But S U T CU implies that 
n(S UT) En(U). 

This implies that maximum value ofn( SU 7) is 1000. 

So, least value ofn(S A T) = 170 

Hence, the least number of consumers who liked both the products A and B is 
170. 


Example 29. Out of 1000 students who appeared for C.A. Intermediate 
Examination, 750 failed in Maths, 600 failed in Accounts and 600 failed in Costing, 
450 failed in both Maths and Accounts, 400 failed in both Maths and Costing, 


150 failed in both Accounts and Costing. The students who failed in all the three al and 
. . e eor 
subjects were 75. Prove that the above data is not correct. 2 


Solution. 
Let, U =Set of students who appeared in the examination. 
A =Set of students who failed in Maths. 
B = Set of students who failed in Accounts. 
C =Set of students who failed in Costing. 
Then, AM B = Set of students who failed in both Maths and Accounts. 
B A C = Set of students who failed in both Accounts and Costing. 
A A C =Set of students who failed in both Maths and Costing. 
A A B A C=Set of students who failed in all the three subjects. 
Now, n(U) = 1000, n(A) = 750, n(B) = 600, n(C) = 600, n(A A B) = 450, 
n(B A C) = 150, n(A A C) = 400, n(4 A BOC) =75 
Therefore, n(4 U B U C) = 750 + 600 + 600 — 450 — 150 — 400 + 75 
= 1025 
This exceeds the total number of students who appeared in the examination. 


NOTES 


Hence, the given data is not correct. 


Example 30. A factory inspector examined the defects in hardness, finish and 
dimensions of an item. After examining 100 items he gave the following report: 

All three defects 5, defects in hardness and finish 10, defects in dimension and 
finish 8, defects in dimension and hardness 20. Defects in finish 30, in hardness 
23, in dimension 50. The inspector was fined. Why? 
Solution. Suppose H represents the set of items which have defect in hardness, F 
represents the set of items which have defect in finish and D represents the set of 
items which have defect in dimension. 

Then, NHAFOD)=5,n(HO F)= 10, n(D A F)=8 

n( D A A) = 20, n(F) = 30, n(H) = 23, n(D) = 50 


So, nHOF UD) =30 +23 + 50-20 - 10-8 +5=70 
Now, n(D VU F) =n(D) + n(F)—-n(D 2 F) 
=50+30-8= 72 


DUFCDUFU Himplies that n(D U F) <n(D O FU A), i.e., 72 <70 
Hence, there is an error in the report and for this reason inspector was fined. 


Example 31. Ina survey of 100 families the numbers that read the most recent 
issues of various magazines were found to be as follows: 


Readers Digest 28 
Readers Digest and Science Today 8 
Science Today 30 
Readers Digest and Caravan 10 
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Probability and Caravan 42 
Set Theory . 
Science Today and Caravan 5 
All the three Magazines 3 
NOTES Using set theory, find 


(i) How many read none of the three magazines? 
(ii) How many read Caravan as their only magazine? 
(iii) How many read Science Today ifand only if they read Caravan? 
Solution. Let, S= Set of those families who read Science Today 
R =Set of those families who read Readers Digest 
C =Set of those families who read Caravan 
(i) Find n(S’ 0 R’ AC’) 
Let U=Set of the families questioned. 

Now, SAR AC =(SURUCY 
=U-(SURUC) 

Therefore,n(S’ A RA C) =n(U)-—n(SURUC) 
= 100-n(SURUC) 

Now, nSURUC) =30+ 28 + 42 -8-10-5+3 =80 

So, n(S’ AR AC’) = 100 — 80 = 20 

(ii) Find n[C- (R U S)] 

Now, n[(C—(RUS)] =n(C)-n[CA(RVUS)] 
=n(C)—nl(COR)U(COS)] 
=n(C)-—n(CAR)—-n(CAS)+n(C ARDS) 
= 42-10-5+3 
= 30 


(iii) Find n[(S A C)—R] 
Now, n[(S A C)—R] =n(SaC)-n(SACn rR) 
=5-3=2 
Example 32. Ina survey conducted of women it was found that, 
(i) There are more single than married women in South Delhi. 
(ii) There are more married women who own cars than unmarried women. 
(iii) There are fewer single women who own cars and homes than married 
women who are without cars but own homes. 
Is the number of single women who own cars and donot own homes greater 
thannumber of married women who do not own cars butown homes? 
Solution. Let, 4 = Set of married women 
B = Set of women who own cars 
C = Set of women who own homes 
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Then, the given conditions are, P. Eo ae 
(i) n(A’) > n (A) 

(ii) n(A AB) >n(A’ AB’ 

(iii)n(A NB’ AC’) >n( A’ ABAO NOTES 

Find n(4’ ABQ C’) and n(A NB’ AC) 

Let, U= Set ofall women questioned. 


Now, A =A'QU=A' A (BUB)=(A'OB)U (LAB) 
A =AQU=AA(BUB)=(AAB)U(ANB’) 
So, n(A’) =n(A4’ VB) +n(4’ OB’) 


n(A) =n(ANB)+n(AnB’) 
According to case (i), we have, 
nA’ O B)+n(4’ A B)>n(ANB)+n (ANB) 
Also, by case (ii) we have, 
n(A’ A B)+n(A’ OB) >n(ANB)+n(ANB’)>n(4’ OB’)+n(AnB’) 


Therefore, 
n(A’ A B)>n (4AB) 
Also, A AB =(4 AB)A(CUC 
=(ČABAO UA ABAC’) 
And ANB =(ANB)A(CUC) 
=(ANBAC)U(ANB ONC) 
So, NA’ AB) =n(AABAC)+n(A ABDC) 
NA DB’) =n(ANBAC)+nANBOAC) 
Using case (iii) we get, 
MA ABAC) +A ABAC)>nANBAC)+nNAABAC) 
i.e., NA ABAC)>n(ANB’ AC) 


So, the number of single women who own cars and do not own a home is 
greater than the number of married women who do not own cars but own homes. 


1.3.1 Counting Principles 


The probability of a successful outcome is calculated as the number of successful 
outcomes divided by the total number of possible outcomes. When the number 
of total outcomes is comparatively small we can list them all and this constitutes 
the entire sample space. This task becomes cumbersome when the number of 
possible outcomes is large. For such situations some counting methods have 
been developed, and this makes it easier to calculate the number of all possible 
outcomes of all events. For example, if we roll a die, we know that there are 
6 possible outcomes, namely, 1, 2, 3, 4, 5, 6. When a second die is rolled, the 
number of possible outcomes for both dice together increases to 6 x 6 = 36. 
If the die is rolled four times, the number of possible outcomes becomes 6 x 
6 x 6 x 6 = 1296. To solve such probability problems, the counting rule can 
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If an event A can occur in n, ways and after its occurrence, event B 
can occur in n, ways, then both events can occur in a total of n, x n, 
different ways in a given order of occurrences. 


For example, if we toss a coin 3 times, there are two possible outcomes 
for each toss. Hence the total number of possible outcomes in 2 x 2 x 2 = 8. 
This can be illustrated as follows: 


Outcome 
HHH 
HHT 
HTH 
HIT 
TIT 
THH 
THT 
8 TTH 
Accordingly, the fundamental counting principle can be expanded as follows: 


ND” BP WN 


If there are k separate parts of an experiment, and the first part can 
he done in n, ways, second successive part in n, ways .... and kth successive 
part in n, ways, then the total number of possible outcomes is given by the 
following product: 


1.3.2 Classes of Sets 


Sets are fundamental objects aimed at defining all other concepts in mathematics. 
Sets are taken as something self-understood. It is a kind of standard concept with 
formal axioms. In earlier days ‘class and set’ were not considered different, as it is 
done now. 

Inset theory, a collection of set or some mathematical objects having similarity 
in property among members is said to form a class. A class in modern set theory 
speaks of an arbitrary collection of elements of the universe. Thus all sets are 
classes since these are collections of elements of the universe, but not all classes 
are sets. A proper class is not a set. 

Set theory has its own language that defines the concept of membership. In 
real world, we deal with different kinds of sets. These may be sets composed of 
numbers, points, functions, or other sets. 

Concept of class became more important after advent of computer science 
and various programming languages. Object oriented programming languages use 
classes and objects while defining functions that perform a particular task. They 
give a set of inputs that goes under a set of processes and gives a set of outputs. 

Sets fall into two classes, basic sets which are typically simple sets which 
form a base for the topology, and denotable sets which are unions of basic sets. 


All geometrical shapes represent classes of sets. Like, a circle denotes a set 
that contains points in a plane or in three dimensional spaces whose distance from 
a fixed point is constant. A parabola is also a set containing points in a plane or in 
three dimensional spaces in which its distance from a fixed point known as focus is 
same as its distance from a line known as directrix. Similarily, other geometrical 
shapes like, ellipse, hyperbola, sphere, ellipsoid, paraboloid, etc., are defined in 
the language of sets. But here the set is not finite as it is not definite as how many 
points lie on these plane curves or three dimensional bodies. 


The fundamental geometric operations are — contains (Set, State), disjoint 
(Set,Set) and subset(Set,Set). Fuzzy basic sets, which are sets of sets, defined 
using interval data, to store intermediate results if these cannot be computed exactly. 
These sets can be converted to ordinary basic sets by over or under approximation. 
Alternatively, the fundamental binary predicates can be computed directly for the 
fuzzy set types. Denotable sets are never fuzzy. 


Basic sets are so-called because they form a base for the topology of the 
space. Typically, basic sets are (a subclass of) convex polytopes or ellipsoids. 
Basic sets must support the fundamental geometric predicates, both within the 
class and with the Rectangle class. Additionally, basic sets may support the optional 
geometric operations, but only if the class of set under consideration is closed 
under that operation. The result may be exactly computable if it involves no 
arithmetic (for example, intersection of two rectangles) or may need to be 
represented by a fuzzy set (for example, Minkowski sum of two rectangles). 
Denotable Sets: A denotable set implements a set as a union of basic sets. The 
specification of the denotable set concept is given by the class followed by a:: 
and name of the set that it contains. 


Predicates and Operations on Sets 


Predicates on Sets 


The fundamental geometric predicates are: 
e contains(): test whether a set contains a point. 
e disjoint(): test whether two sets are disjoint. 
e subset(): test whether a set is a subset of another. 
e intersects(): test whether two sets intersect. 
è superset(): test whether the second set is a subset of the first. 


All these predicates return a fuzzy logic value, since the result of the test may 
be impossible to determine at the given precision. 


1.3.3 Power Sets 


Power set is a class of sets which is a collection all the subsets that is formed by 
member of a set denoted by A or any other letter. Number of power sets that can 
be formed is given by 2n where n is the cardinality of the set, i.e., number of 
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members of the set. For example ifa set contains three members then number of 
subsets that it can form is 23 = 8 subsets. There are 8 subsets in a power set of set 
A when #(A), n(A) or |A| = 3. 


For example, let there be a set A = {1,2,3} then the power set contains 
subsets { {1,2,3}, {3,1}, {2,1}, {1,3}, {1,2}, {1}, {2}, {3}, { } }. Here, null set 
in which there is no element and its universal set in which all the elements are 
present, both are present. 


Fuzzy Sets: This kind of set has varying degree of membership. It was proposed 
in 1965 by L.A. Zadeh and it is an extension of the conventional notation of set. In 
conventional set theory an element is either in a set or not ina set, but in a fuzzy set 
there is a grade of association. 


If there is a set S= {X,, .......4. ,x }, the fuzzy set (S, m) is shown as 

{MAA ye axes se, IK )/X p- 

If m(x) = 0 for any x then it means that the member is not in the set and if 

m(x) = 1, shows full membership in the fuzzy set. Any value in-between 0 to 
1 shows varying degree of association of members of the fuzzy set. 


Operation on Fuzzy Sets 


Union: Union of two fuzzy sets S, and S, having membership function (S,) and 
(S,) is given by max((S;), (S,)). This operation resembles OR operation in Boolean 
algebra. 


Intersection: Intersection of two fuzzy sets S| and S, having membership function 
(S,) and (S,) is given by min((S,), (S,)). This operation resembles AND operation 
in Boolean algebra. 


Complement: This operation is the negation of the specified membership function 
and shows the negation criterion. This operation is like NOT operation in Boolean 
algebra. 


Rules which are common in classical set theory also apply to fuzzy set theory. 
Important Terms associated with Fuzzy Set 


Universe of Disclosure: It is the range of input values for a fuzzy logic. 


Fuzzy Set: A set that allows degree of association from 0 to 1. Zero (0) shows 
no association and | shows full association. 


Fuzzy Singleton: It is a fuzzy logic having single point with membership of 1. 
Example 33. Classification of dwelling units. 


Problem Statement. A builder wants to classify the flats that he is building and 
intending to sell to home seekers. Level of comfort is given by number of bedrooms 
in a flat. IfU represent the set of those available flats and is given as: 


U= {x|xe [0,1] € J]. Flats are denoted by ‘u’ number of rooms ina SS and 
dwelling unit. Builder gives a comfort level for ‘a family of four’. eee 


Solution. A comfortable flat for a ‘family of four’ is described by a fuzzy set as 
given below: 


NOTES 
FlatForFour 
= FuzzySet[{{l, 0.2}, {2, 0.5}, {3, 0.8}, {4, 
11}, 
{5, 0.7}, {6, 0.3}}, UniversalSpace > {1, 
0}] 


Fuzzyplot[FlatFour, ShowDots—True] 


Membership Grade 
1 


0.8 


0.6 


U 
12 3 4 5 6 7 8 9 10 11 


Example 34. Problem on age. Range is from 0 to 100. 


Problem Statement. Here fuzzy set is used for representing age that ranges 
from 1 to 100. 


Solution. This can be discussed with the help of fuzzy sets and for that we set the 
universal space for age to have a range from 0 to 100. 


SetOptions[FuzzySet, UniversalSpace—{0, 100}] 
Example 35. Problem on youth age. Range in from 0 to 40. 
Problem Statement. Represent concept of youth by a fuzzy set. 
Solution. Defined as a fuzzy set. 
Youth = FuzzyTrapezoid[0, 25, 40] 


Ina similar way, the property of ‘being old’ can also be given as a fuzzy set. This 
is as below: 


Old = FuzzyTrapezoid[50, 65, 100, 100] 
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— and Example 36. This example shows the operation ‘intersection’ on fuzzy sets. 
et Theory 
Problem Statement. Define the concept of ‘middle-aged’ using fuzzy set. 


Solution. A middle age means ‘not old’ OR ‘not young’. This requires use of 

NOTES operators NOT, OR. Thus a middle age is found by ‘not old’ (Complement of 
old) OR (Disjunction) ‘not young’ (Complement of youth) as in example 33(a) 
and 33(b). 


Middle-Aged = Intersection[Complement [Young], 
Complement [Old] ]; 


We can also define and operation FuzzyPlot to find a graphical presentation of 
age, performing an operation which named as FuzzyPlot. 


FuzzyPlot[Young, Middle-Aged, Old, 


PlotJoined- True]; 


Membership Grade 
q 
0.8 
0.5 
0.4 
0:2 
U 
0 20 40 60 80 100 


The graph shows that the intersection of ‘not young’ and ‘not old’ gives areasonable 
definition for the concept of ‘middle-aged.’ 


Example 37. Fuzzy set with natural numbers. 


Problem Statement. To define a set of natural numbers in the neighbourhood of 
6. This can be done in different ways. 


Solution. Define a fuzzy set of number adjacent to 6. 
SetOptions[FuzzySet, UniversalSpace{0, 20}]; 

Sixl = 

Fuzzyset[{{3, 0.1}, {4, 0.1 {4, 0.3} {5, 0.6}, 
{6,1.0}, Number’s closeness to 6. 
{Te UGF (8,0 Or shy {Op O22} ] 

FuzzyPlot[Sixl]; 
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Solution. We use a function FuzzyTrapazoid and create a fuzzy set. If we think 
that triangular fuzzy setis perhaps a better choice then we have to set two parameters 
in the middle ofthe set as 6 which would tell about the closeness to number 6. We 
also define a set Six2 which shows a set of, 


Six2=FuzzyTrapezoid[2, 6, 6, 10]; 


FuzzyPlot[Six2] ; This shows a set of points according this definition. 


Membership Grade 
£ 


U 


SS ee eee 
oh. <9: Tair ESS TY 2L 


Solution. There may be a third solution which creates a fuzzy set defining nearness 
to 6. 


1 


CloseTo[x]: = — 
1+ (1-~x) 


We name it Six3. 
Six3 = CreateFuzzySet[CloseTo[6]]; 


FuzzyPlot[Six3]; 


Membership Grade 
1 


U 
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Probability and Solution. There may be a fourth solution which uses a ‘piece-wise defined 
Set Theory — 

function’. 

NearSix[x] == which[x = 6,1,x > 6 & & < 12, 
1 
NOTES zıX < 668 >Q, z, True, 0] 
(x — 5) (7 — x) 
Six4 = CreateFuzzySet [NearSix]; 


FuzzyPlot [Six4]; 


Membership Grade 
1 


0.8 


0.6 


i 3 5 -7 8 Ti T3 15 17 19 21 
Example 38. Problem on disjunctive sum. 
Problem. To find, 


RMat = {{.8, .3, .5, .2}, {.4, 0, .7, .3}, {.6, .2, .8, .6}} 
This shows a one membership matrix. 


oMat = {4297 6.5728, Lh, {247 braly ky 
{od pe8pn 8p 7} } 


This shows another membership matrix. 


R = FromMembershipMatrix[Rmat]; This defines relation in 
RMat 

S = FromMembershipMatrix[SMat]; This defines relation in 
SMat. 

FuxxyPlot 3D[R,S, ShowDots—True] This shows location of 
point. 


Grade De 
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Let there be two fuzzy relationships R and S in the universal space V x W. Probability and 


The disjunctive sum is given by: DisSum = (R A S’) U (R? A S), whichis a relation SR 
in V x W, with the following property: 
yv, w E€ VxW, and NOTES 


DisSum(v, w)= Maximum of (Min(R(v, w), 1 - S(v, 
w)), 

Min(l - R(v, w), S(v, w))) 

DisSum = Union[Intersection[R, Complement[s]], 
Intersection[Complement[R],S]]; 
FuzzyPlot3D[DisSum] ; 


ToMembershipMatrix[DisSum] // MatrixForm 


rū. O.8 D. 


0.4 0.6 a. 
„0.4 0.8 0. 


D.E) 
0.5 
0.4, 


Do owd on 


Example 39. Problem on distance. 


Problem Statement. We take a fuzzy relation R on sets X, Y. We define these 
sets as: X= {Delhi, Moscow} and Y= {Dacca, Delhi, London). The relation R is 
representing the idea of being ‘far’. The relation may be shown as R(X, Y) = 1.0/ 
Delhi, Dacca + 0/Delhi, Toronto + 0.6/Delhi, London + 0.9/Moscow, Delhi + 
Dacca + 0.9/London, Dacca + 0.7/London, Delhi + 0.3/London. 


Solution. Representation of such a fuzzy relation is as below: Amembership matrix 
is to be created to depict the relationship. For this we represent by number for 
each city in the set. For Xwe keep 1 for Delhi and 2 for Moscow. In Y we set 1 for 
Dacca, 2 for Delhi and 3 for London. 

After this, we are in a position to create relation using a function by name from 
memebrship matrix and the relationship can be plotted using a function that is 
named FuzzyPlot3D. 

DistMat = {{1,0,0,6}, {0.9,0.7,0.3}} 
{{ pO, O04, 029,027,043} 


ffl, 0, 0.6}, (0.9, 0.7, O. 34} 


iit ial f 
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Probability and DistRel = FromMembershipMatrix[DistMat, 
Set Theory 
{{1,2}, {1,3}}] 
FuzzyRelation[ 
NOTES Mlele Oly Oy OP at 
{{2,1},0.9}, {{2,2},0.7,{{2,3}0.3}}, 


UniversalSpace>{{1,2,1},{1,3,1}}] 


This relation can be plotted using the FuzzyPlot3D function. 
Fuzzy Plot3D[DistRel, 

Axes Labela{“ X”, “Y”, “Grade”], 

View Point->{2, 0, 1}, 

Axxes Edge—i=l, -1}, tly -1}, {1, 1}}]; 


ToMembershipMatrix[DisRel] // MatrixForm 
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Example 40. To choose a job, Fuzzy sets can help to choose between four given 
jobs. Let this job be numbered as 1 to 4. 


Problem Statement. We have to make selection such that the job provides best 
salary and at the same time it should be near to place of stay. 


Solution. The first part of selection criteria is given by following definition of fuzzy 
sets. From this selection criterion Job 3 looks most attractive out of all these four 
jobs and Job 1 is least attractive. Similarly, a fuzzy set can be created for the 
second part of the selection criteria. We create a set named drive for the distance 
from the working place. Here membership has varying grade depending on distance. 
Here, least is desired. 


Interest = 
Fuzzyset[{{1, .4}, £2, .63, £3, .8}, {4, .6}}, 
Self-Instructional UniversalSpace > {1, 4}] 
42 Material 


a 


FuzzySet[{{1, 0.4}, {2; 0.6}, (3, 0.8}, (4, 0.64}, 
UniversalSpace + {1, 4, 1}] 
From analysis we find that from the second criterion Job 4 is most attractive as 
it is nearest among the given four and Job 1 is the farthest. 


As the goal is to get a good salary, we find that Job 1 has highest and Job 4 is 
the lowest. 


Drive = 
FuzzySet[({{1, .1}, €2, .9}, €3, .7}, {4, 1}}, 
UniversalSpace — {1, 4}] 


Fuzcyset({{1, 0.1), (2, 0.93, 13, O.7}, (4, 1H; 


UniversalSpace + {1, 4, 1}] 


Salary = 
Fuzzyset(i{1, .875}, (2, .7}, (3, .3}, {4, .2}}, 
UnivrersalSpace — {1, 4}] 


Fussyset [ 
ry oars. r> La ra 0.5} raf wu 
{{1, 0.6753, (2, 0.731, 13, 0.54, {4, 0.233, 


Universalapace + {1, 4, 1)] 


We have examined all criteria one-by-one using fuzzy sets, but have made 
anything to make a final decision. So a decision function is being defined using 
intersection. By applying intersection operation betwen the constraints and goals 
would give best decision. Plotting the fuzzy set for decision we can visualize the 
result graphically. Considering all these the decision function says that Job 2 is the 
best. 


Decision = Intersection[Interest, Drive, Salary] 


FuzzySet[{{l, 0.1}, (2, 0.6}, (3, 0.54, {4, 0.244, 


UniversalSpace = {1, 4, 1} 


We can plot the decision fuzzy set to see the results graphically. 


FuzzyPlot [Decision]; 


Membership Grade 
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Power Set 


The set of all subsets ofa given set A is called the power set of A and is denoted 
by P(A). The name power set is motivated by the fact that ‘if A has n elements 
then its power set P(A) contains exactly 2” elements.’ 

Example 41. If A = {1, 2}, find P(A). 

Solution. Now @is a subset of A. A is also a subset of A. {1} and {2} are also 
subsets of A. Therefore, these are all subsets of A. So, P(A) =[@, {1}, {2}, A]. 
Therefore P(A) has 2? = 4 elements. 

Example 42. Let A = {1, 2,3}. Find P(A). 

Solution. Now subsets consisting of one element only are {1}, {2}, {3}. Subsets 
consisting of two elements only are, {1, 2}, {2,3}, {1,3}. Also @and A are 
subsets of A. 


So P(A) =[@, {1}, {2}, {3}, {1,2}, {2, 3}, {1, 3}, A] and the number of 
elements in P(A) is 2° = 8. 
Example 43. Let B be a subset of A. Let P(A : B) = {Xe P(A) |B cX}. IF B= 
{1,2} and A= {1, 2, 3, 4, 5}, list all the elements of P(A - B). 
Solution. Clearly B c {1,2}, Bc {1,2,3}, BC {1, 2, 3,4}, BC {1, 2, 3, 
4,53,Bc {1,2,4}, Bc {1, 2,5}, Bc {1,2,3,5}, BC {1, 2, 4, 5}. These 
give all the elements of P (A : B). 
Duality 
Union U and Intersection A of sets are termed as dual operations. 


If the validity of a law involving are of the two, U or A is established, then the dual 
obtained by replacing U by ^ and A by VU is also established. 


Partition of a Set 
The partition ofa set A is written as, 

A= AA A} 
Where A, CA j=1l,2,...,n or 4,’s are inclusive. 
Thus, (i) A,, A,, ..., A, are subsets of A 

Gi)A,NA,=O f= 1,2,..50, K=1,2,...,n 

i.e., Any 4; ,A, are disjoints. 

(iii) A, UA, U . UA, =A, i.e., A), A,,..., 4, are exhaustive. 
Thus every elements of A is amember of one and only one of the subsets in the 
partition. 
Any sample S can be written as, 


S={A, A} 


= {40B, ANB, AQB,AQB 


In any exercise, if (i), (ii), (iii) are all satisfied for a set A = {A,,A,,...,4,} then 
this represents the partition of A. 


Check Your Progress 


7. Define the terms set and element. 
8. Differentiate between finite set and infinite set. 
9. What is singleton set? 
10. Define the term subset. 
11. What do you mean by null set? 
12. What are Venn diagrams? Why are they called so? 
13. What is union of sets? 
14. Define intersection of sets. 
15. What does complement of sets mean? 


16. What is a power set? 


1.4 CONDITIONAL PROBABILITY AND 
INDEPENDENCE 


In many situations, a manager may know the outcome of an event that has 
already occurred and may want to know the chances of a second event occurring 
based upon the knowledge of the outcome of the earlier event. We are interested 
in finding out as to how additional information obtained as a result of the 
knowledge about the outcome of an event affects the probability of the occurrence 
of the second event. For example, let us assume that a new brand of toothpaste 
is being introduced in the market. Based on the study of competitive markets, 
the manufacturer has some idea about the chances of its success. Now, he 
introduces the product in a few selected stores in a few selected areas before 
marketing it nationally. A highly positive response from the test-market area will 
improve his confidence about the success of his brand nationally. Accordingly, 
the manufacturer’s assessment of high probability of sales for his brand would 
be conditional upon the positive response from the test-market. 


Let there be two events A and B. Then the probability of event 4 given 
the outcome of event B is given by: 


P[AB] 


Where P[A/B] is interpreted as the probability of event A on the condition that 
event B has occurred and P [AB] is the joint probability of event A and event 
B, and P[B] is not equal to zero. 
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As an example, let us suppose that we roll a die and we know that the 
number that came up is larger than 4. We want to find out the probability that 
the outcome is an even number given that it is larger than 4. 


Let, event A = Even 
And event B = Larger than 4 
P[Even and larger than 4] 
Then, P[Even / Larger than 4] = ~~ PfLarger thand] 
| 
Or, prag] = 222] (Vo) = 


I S 
PBL a) 

But for independent events, P[AB] = P[A]P[B]. Thus substituting this 
relationship in the formula for conditional probability, we get: 


P[AB] _ P[A]P[B] _ 


PABI- DIB] PLB) 


P[A] 


This means that P[A] will remain the same no matter what the outcome 
of event B is. For example, if we want to find out the probability of a head on 
the second toss of a fair coin, given that the outcome of the first toss was a 
head, this probability would still be 1/2 because the two events are independent 
events and the outcome of the first toss does not affect the outcome of the 
second toss. 


1.4.1 Independent and Dependent Events 


Two events A and B are said to be independent events, if the occurrence of one 
event is not influenced at all by the occurrence of the other. For example, if two 
fair coins are tossed, then the result of one toss is totally independent of the 
result of the other toss. The probability that a head will be the outcome of any 
one toss will always be 1/2, irrespective of whatever the outcome is of the other 
toss. Hence, these two events are independent. 


Let us assume that one fair coin is tossed 10 times and it happens that the 
first nine tosses resulted in heads. What is the probability that the outcome of 
the tenth toss will also be a head? There is always a psychological tendency to 
think that a tail would be more likely in the tenth toss since the first nine tosses 
resulted in heads. However, since the events of tossing a coin 10 times are all 
independent events, the earlier outcomes have no influence whatsoever on the 
result of the tenth toss. Hence the probability that the outcome will be a head 
on the tenth toss is still 1/2. 


On the other hand, consider drawing two cards from a pack of 52 playing 
cards. The probability that the second card will be an ace would depend upon 
whether the first card was an ace or not. Hence these two events are not 
independent events. 


Independent Repeated Trials 


Probability is a measure of relative frequency. According to definition, the 
probability of an event E is equal to the number of equally likely ways E can 
occur divided by the total number of equally likely things which can occur. 


For studying independent repeated trials, combinatorics is considered important 
when we consider that an event is consisting of repeated trials. Tossing a fair coin 
many times is an example of it. Suppose that a fair coin is tossed 10 times. Now 
the probability of the coin landing on heads for each toss is 1/2, since there are 
two possible equally-likely outcomes (heads or tails) and just one way it can come 
up heads. Furthermore, each toss of the coin is independent of every other toss. 
Basically this defines that the coin has no memory and the probability of the coin 
landing on heads for any given toss will be always 1/2. It has no relation with the 
outcome history of the previous tosses. Suppose the coin had just landed on heads 
8 times in a row. Is it possible that the coin is more likely to land on tails on the next 
toss? The probability of it landing on heads is always 1/2, since the coin has no 
memory. It is true that in the long run, 50 per cent of the tosses will be heads and 
50 per cent tails, but this is not achieved by the coin making up for any deficit of 
heads or tails but rather by turning up heads roughly half the time in all future 
tosses. 


The fraction of successes depends on the probability of success in each trial, 
as the number of trials increases in repeated independent trials with the same 
probability of success. This is also known as the (Law of Large Numbers). 
According to this law if a discrete random variable is observed repeatedly in 
independent experiments then the fraction of experiments for which the random 
variable equals any of its possible values has the probability that the random variable 
equals that value. 


Check Your Progress 


17. Explain the concept of independent events. 
18. Define binomial distribution. 


19. What is a random variable? 


1.5 ANSWERS TO CHECK YOUR PROGRESS 
QUESTIONS 


1. The term simple probability refers to a phenomenon where only a simple 
or an elementary event occurs. For example, assume that event (E), the 
drawing of a diamond card from a pack of 52 cards, is a simple event. 
Since there are 13 diamond cards in the pack and each card is equally likely 
to be drawn, the probability of event (E) or P[E] = 13/52 or 1/4. 
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The term joint probability refers to the phenomenon of occurrence of two 
or more simple events. For example, assume that event (E) is a joint event 
(or compound event) of drawing a black ace from a pack of cards. There 
are two simple events involved in the compound event, which are: the card 
being black and the card being an ace. Hence, P[Black ace] or PLE] = 2/ 
52 since there are two black aces in the pack. 


. The classical theory of probability is the theory based on the number of 


favourable outcomes and the number of total outcomes. The probability is 
expressed as a ratio of these two numbers. The term ‘favourable’ is not the 
subjective value given to the outcomes, but is rather the classical terminology 
used to indicate that an outcome belongs to a given event of interest. 


. The addition rule states that when two events are mutually exclusive, then 


the probability that either of the events will occur is the sum of their separate 
probabilities. For example, if you roll a single dice then the probability that 
it will come up with a face 5 or face 6, where event A refers to face 5 and 
event B refers to face 6, both events being mutually exclusive events, is 
given by, 
P{AorB] = P[A]+P[B] 
Or, P[5 or 6] = P[5]+P[6] 
= 1/6+1/6 
= 2/6=1/3 


. Multiplication rule is applied when it is necessary to compute the probability 


in case two events occur at the same time. 


. Bayes’ theorem on probability is concerned with a method for estimating 


the probability of causes which are responsible for the outcome of an 
observed effect. The theorem contributes to the statistical decision theory 
in revising prior probabilities of outcomes of events based upon the 
observation and analysis of additional information. 


. Two events are said to be mutually exclusive, if both events cannot occur at 


the same time as the outcome of a single experiment. For example, if we 
toss a coin, then either event head or event tail would occur, but not both. 
Hence, these are mutually exclusive events. 


. ‘A set is any collection of objects such that given an object, it is possible to 


determine whether that object belongs to the given collection or not.’ 


The members ofa set are called elements. We use capital letters to denote 
sets and small letters to denote elements. We always use { } brackets to 
denote a set. 


. A set which has finite number of elements is called a finite set, else it is called 


an infinite set. 


10. 


11. 


12. 


13. 


14. 


15. 


16. 


WA 


18. 


19. 


. A set having only one element is called Singleton. Ifa is the element of the 


singleton A, then A is denoted by A = {a}. 


Let A and B be two sets. If every element of A is an element of B, then A is 
called a subset of B and we write A c B or BD A (read as ‘A is contained in 
B’ or ‘B contains A’). 


A set which has no element is called the null set or empty set. It is denoted 
by the symbol @. 

Venn diagrams are used to illustrate various set operations. It is named after 
John Venn (1834-1883 ). 


The union of any two sets A and B is the set of all those elements x such that 
x belongs to at least one of the two sets A and B. It is denoted by A UB. 
Logically speaking, if the biconditional statement (x € C) = (x € A) v (xe B) 
is true for all x, then C=A UB. In other words (x € A U B)= (xe A) v 
(xe B). 

The intersection of two sets A and B is the set of all those elements x such 
that x belongs to both A and B and is denoted by A A B. If A A B= ọ, then A 
and B are said to be disjoint. 


Logically speaking, if the biconditional statement (x € C) & (x € A) A (xe B) 
is true for all x, then C=A A B. Hence, 


(xE ANB)=(XE ANGE B) 


If A and B are two sets then complement of B relative to A is the set of all 
those elements x € A such that x ¢ B and is denoted by A — B. Logically 
speaking, if for a set C the biconditional statement (x e C) @ (xe A) A 
(xg B) is true for all x, then C = A — B. In other words, if (x € C)=(x€ A) 
A (x¢ B) then Cis called the complement of B relative to A. 


The set of all subsets of a given set A is called the power set of A and is 
denoted by P(A). The name power set is motivated by the fact that ‘if A has 
n elements then its power set P(A) contains exactly 2” elements.’ 


Two events A and B are said to be independent events, if the occurrence of 
one event is not at all influenced by the occurrence of the other. For example, 
if two fair coins are tossed, then the result of one toss is totally independent 
of the result of the other toss. The probability that a head will be the outcome 
of any one toss will always be 1/2, irrespective of whatever the outcome is 
of the other toss. Hence, these two events are independent. 

Binomial distribution is one of the simplest and most frequently used discrete 
probability distribution and is very useful in many practical situations involving 
either/or types of events. 

A random variable is a phenomenon of interest in which the observed 
outcomes of an activity are entirely by chance, are absolutely unpredictable 
and may differ from response to response. 
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1.6 


SUMMARY 


Aprobability is expressed as areal number, p € [0, 1] and the probability 
number is expressed as a percentage (0 per cent to 100 per cent) and not 
as a decimal. 


The classical theory of probability is the theory based on the number of 
favourable outcomes and the number of total outcomes. 


If the number of outcomes belonging to an event £ is N , and the total 
number of outcomes is N, then the probability of event £ is defined as 


We cannot calculate the probability where the outcomes are unequal 
probabilities. 


The sequence ae in the limit that will converge to the same result every 


time, or that it will not converge at all. 


The axiomatic probability theory is the most general approach to probability, 
and is used for more difficult problems in probability. 


The empirical approach to determining probabilities relies on data from 
actual experiments to determine approximate probabilities instead of the 
assumption of equal likeliness. 


The relationship between these empirical probabilities and the theoretical 
probabilities is suggested by the (Law of Large Numbers). The law states 
that as the number of trials of an experiment increases, the empirical 
probability approaches the theoretical probability. 


Multiplication rule is applied when it is necessary to compute the probability 
if both events A and B will occur at the same time. 


Bayes’ theorem makes use of conditional probability formula where the 
condition can be described in terms of the additional information which 
would result in the revised probability of the outcome of an event. 


A sample space is the collection of all possible events or outcomes of an 
experiment. 

An event is an outcome or a set of outcomes ofan activity or a result ofa 
trial. 

Two mutually exclusive events are said to be complementary if they between 
themselves exhaust all possible outcomes. 

A probability space is a measure of space such that the measure of the 
whole space is equal to 1. Asimple finite probability space is an ordered 
pair (S, p) such that Sis set and p is a function with domain S. 


e The members of a set are called its elements. We use capital letters to Probability and 


1.7 


denote sets and small letters to denote elements. ee 
e A set having only one element is called singleton. Ifa is the element of the 
singleton A, then A is denoted by A = {a}. NOTES 


Whenever we talk ofa set, we shall assume it to be a subset of a fixed set 
U. This fixed set U is called the universal set. 


A set which has no element is called the null set or empty set. It is denoted 
by the symbol f. 

The union of any two sets A and B is the set ofall those elements x such that 
x belongs to at least one of the two sets A and B. 


The intersection of two sets A and B is the set of all those elements x such 
that x belongs to both A and B and is denoted by A A B. If A A B= @, then 
A and B are said to be disjoint. 


If A and B are two sets then complement of B relative to A is the set of all 
those elements x € A such that x ¢ B and is denoted by A — B. 


Two events A and B are said to be independent events, if the occurrence of 
one event is not influenced at all by the occurrence of the other. 


The fraction of successes depends on the probability of success in each 
trial, as the number of trials increases in repeated independent trials with the 
same probability of success. 


KEY WORDS 


Classical theory of probability: It is the theory of probability based on 
the number of favourable outcomes and the number of total outcomes. 


Event: It is an outcome or a set of outcomes of an activity or the result of 
atrial. 


Elementary event: It is the single possible outcome of an experiment. It is 
also known as a simple event. 


Joint event: It is also known as compound event and has two or more 
elementary events in it. 


Sample space: It is the collection of all possible events or outcomes of an 
experiment. 


Addition rule: It states that when two events are mutually exclusive, then 
the probability that either of the events will occur is the sum of their separate 
probabilities. 

Multiplication rule: It is applied when it is necessary to compute the 
probability, if both events A and B occur at the same time. Different rules 
are applied for different conditions. 
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e Set: Itis a collection of objects, such that given an object, it is possible to 


determine whether that object belongs to the given collection or not. 


e Element: The members ofa set are called its elements. 


e Singleton: It is a set having only one or single element. 


e Null set: It is a set with no elements. 


1.8 


SELF-ASSESSMENT QUESTIONS AND 
EXERCISES 


Short-Answer Questions 


O ON HNN BW NY 


NO NO NO NN ND HR RK HK HF RR RS eR 
A BPW Ne DODO WON DN” BW N KF CO 


. Explain the concept of probability. 

. What are the different theories of probability? Explain briefly. 
. What do you understand by simple probability? 

. Explain the axiomatic approach to probability. 

. Explain the concept of multiplication rule. 

. What is Bayes’ theorem? What is its importance in statistical calculations? 
. Define sample space. 

. Explain event and its types with the help of examples. 

. What is a mutually exclusive event? 

. Explain the terms, ‘Sum of Events’ and ‘Product of Events’. 

. What is finite probability space? 

. Define set with the help of examples. 

. How will you define a universal set? 

. When are two sets termed equal? 

. Explain power set. 

. Describe union, intersection and complement set operations. 

. Explain the importance of Venn diagrams. 

. What do you understand by finite sets and counting principle? 
. Explain distributive laws and De Morgan’s laws of set theory. 
. What are the important applications of set theory? 

. Explain duality. 

. What is mathematical induction? 

. Describe the importance of conditional probability. 

. When are independent repeated trials used? 


. Explain the properties of binomial distribution. 


26. What is arandom variable? Differentiate between qualitative and quantitative 


random variables. 


Long-Answer Questions 


l. 


A family plans to have two children. What is the probability that both 

children will be boys? (List all the possibilities and then select the one which 

would be two boys.) 

A family plans to have three children. List all the possible combinations and 

find the probability that all the three children will be boys. 

A card is selected at random from an ordinary well-shuffled pack of 52 

cards. What is the probability of getting: 

(i) Aking 

(ii) A spade 

(iii) A king or an ace 

(iv) A picture card 

A wheel of fortune has numbers 1 to 40 painted on it, each number being 

at equal distance from the other so that when the wheel is rotated, there 

is the same chance that the pointer will point at any of these numbers. 

Tickets have been issued to contestants numbering 1 to 40. The number at 

which the wheel stops after being rotated would be the winning number. 

What is the probability that: 

(i) Ticket number 29 wins. 

(ii) One person who bought 5 tickets numbered 18 to 22 inclusive wins the 
prize. 

In a computer course, the probability that a student will get an A is 0.09. 

The probability that he will get a B grade is 0.15 and the probability that 

he will get a C grade is 0.45. What is the probability that the student will 

get either a D or an F grade? 


The Dean of the School of Business has two secretaries, Mary and Jane. 
The probability that Mary will be absent on any given day is 0.08. The 
probability that Jane will be absent on any given day is 0.06. The probability 
that both the secretaries will be absent on any given day is 0.02. Find the 
probability that either one of them will be absent on any given day. 


A fair die is rolled once. What is the probability of getting: 
(i) An odd number. 

(ii) A number greater than 3. 

Two fair dice are rolled. What is the probability of getting: 
(i) A sum of 10 or more. 

(ii) A pair of which atleast one number is 3. 

(iii) A sum of 8, 9, or 10. 

(iv) One number less than 4. 
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Probability and 9. An urn contains 12 white balls and 8 red balls. Two balls are to be selected 


SECRE in succession, at random and without replacement. What is the probability 
that: 
(i) Both balls are white. 
NOTES 


(ii) The first ball is white and the second ball is red. 

(iii) One white ball and one red ball are selected. 

(iv) Would the probabilities change if the first ball after being identified is 
put back in the urn before the second ball is selected? 

10. In a statistics class, the probability that a student picked up at random 
comes from a two parent family is 0.65, and the probability that he will fail 
the exam is 0.20. What is the probability that such a randomly selected 
student will be a low achiever given that he comes from a two parent 


family? 

11. The following is a breakdown of faculty members in various ranks at the 
college. 
Rank Number of Males Number of Females 
Professor 20 12 
Assoc. Professor 18 20 
Asst. Professor 25 30 


What is the probability that a faculty member selected at random is: 
(i) A female. 

(ii) A female professor. 

(iii) A female given that the person is a professor. 

(iv) A female or a professor. 

(v) A professor or an assistant professor. 


(vi) Are the events of being a male and being an associate professor 
statistically independent events? 

12. A movie house is filled with 700 people and 60 per cent of these people 
are females. 70 per cent of these people are seated in the no smoking area 
including 300 females. What is the probability that a person picked up at 
random in the movie house is: 

(i) A male. 

(ii) A female smoker. 

(iii) A male or a non-smoker. 

(iv) A smoker if we knew that the person is a male. 

(v) Are the events sex and smoking statistically independent? 

13. A part-time student is taking two courses, namely, Statistics and Finance. 
The probability that the student will pass the Statistics course is 0.60 and 
the probability of passing the Finance course is 0.70. The probability that 
the student will pass both courses is 0.50. Find the probability that the 
student: 
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(i) will pass at least one course. Probability and 


è : Set Theory 
(ii) will pass either or both courses. 
(iii) will fail both courses. 
. 200 students from the college were surveyed to find out if they were taking NOTES 


any of the Management, Marketing or Finance courses. It was found that 
80 of them were taking Management courses, 70 of them were taking 
Marketing courses and 50 of them were taking Finance courses. It was 
also found that 30 of them were taking Management and Marketing courses, 
30 of them were taking Management and Finance courses and 25 of them 
were taking Marketing and Finance courses. It was further determined that 
20 of these students were taking courses in all the three areas. What is the 
probability that a particular student is not taking any course in any of these 
areas? 


. Out of 20 students in a Statistics class, 3 students are failing in the course. 
If 4 students from the class are picked up at random, what is the probability 
that one of the failing students will be among them. 


. The New York Pick Five lottery drawing draws five numbers at random out 
of 39 numbers labelled 1 to 39. How many different outcomes are possible? 


. A company has 18 senior executives. Six of these executives are women 
including four blacks and two Indians. Six of these executives are to be 
selected at random for a Christmas cruise. What is the probability that the 
selection will include: 


(i) All the black and Indian women. 
(ii) Atleast one Indian woman. 

(iii) Not more than two women. 

(iv) Half men and half women. 


. The probability that a management trainee will remain with the company 
after the training programme is completed is 0.70. The records indicate that 
60 per cent of all managers earn over $60,000 per year. The probability that 
an employee is a management trainee or who earns more than $60,000 per 
year is 0.80. What is the probability that an employee earns more than 
$60,000 per year, given that he is a management trainee who stayed with 
the company after completing the training programme. 

. A manufacturer of laptop computer monitors has determined that on an 
average, 3 per cent of screens produced are defective. A sample of one 
dozen monitors from a production lot was taken at random. What is the 
probability that in this sample fewer than 2 defectives will be found? 

. A fair coin is tossed 16 times. What is the probability of getting no more 
than 2 heads? 

. A student is given 4 True or False questions. The student does not know 
the answer to any of the questions. He tosses a fair coin. Each time he gets 
a head, he selects True. What is the probability that he will get: 
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22. 


23. 


24. 


25. 


26. 


27. 


(i) Only one correct answer 
(ii) At most 2 correct answers 
(iii) At least 3 correct answers 
(iv) All correct answers 


A newly married couple plans to have 5 children. An astrologist tells them 
that based on astrological reading, they have an 80 per cent chance of 
having a baby boy on any particular birth. The couple would like to have 
3 boys and 2 girls. Find the probability of this event. 


An automatic machine makes paper clips from coils of wire. On an average, 
one in 400 paper clips is defective. If the paper clips are packed in small 
boxes of 100 clips each, what is the probability that any given box of clips 
contains. 


(i) No defectives 

(ii) One or more defectives 
(iii) Less than two defectives 
(iv) Two or less defectives 


Because of recycling campaigns, a number of empty glass soda bottles are 
being returned for refilling It has been found that 10 per cent of the 
incoming bottles are chipped and hence are discarded. In the next batch of 
20 bottles, what is the probability that: 


(i) None will be chipped. 

(ii) Two or fewer will be chipped. 

(iii) Three or more will be chipped. 

(iv) What is the expected number of chipped bottles in a batch of 20? 


The customers arrive at a drive-in window of Apple bank at an average 
rate of one customer per minute. 


(i) | What is the probability that exactly two customers arrive in a given 
minute? 

(ii) What is the probability of no arrivals in a particular minute? 

Write the following sets by listing elements enclosed in brackets { }: 

(i) Aisthe set whose elements are first five letters of the alphabet. 

(ii) Bis the set ofall odd integers. 

(iii) Xis the set of all two digit positive numbers which are divisible by 15. 

Write the following sets using a statement to designate each: 

(i) A= {3,6,12, 15,18} 

(ii) B= {s,t,u,v,w,x,y,Zz} 

(iii) C= {1,3,5,7,....,2n—1,...} 

fle aa: 1 ) 

E gf 


(iv) D= 


28. 


29; 


30. 


31. 
32. 
33. 


34. 
35. 
36. 


37. 
38. 


39. 


Indicate which of the following sets are finite: Probability and 
Set Theory 

(i) A={x|xisa positive integer} 

(ii) B= {x|x isan even integer lying between 2 and 10} 

(iii) C= {x|x isa letter of the alphabet} NOTES 


(iv) E= {x|xis an integter less than 10} 
Let A be the set { 1, 3,5, 7,9, 11, 13, 15, 17, 19}. Now, list the following sets: 
(i) {x|xisanelement of A and x + 1 is even} 
(ii) {x|xis an element of A and 2x is an element of A} 
(iii) {x|x1is an element of A and 2x < 20} 
(iv) {x|xisnotan element of A and0<x<21} 
Find all possible solutions for x and y in each of the following cases: 
() {2,35 = {2x, y} 
(i) {x,y} = {1,2} 
(iii) {x, x7} = {9,3} 
Show that ifa, #b,,a, #a,and {a,, b,} = {a,, ba}, then b) = b. 
Show that {a} = {b, c} ifand only ifa = b =c. 
State the relation if any, between sets A and B in the following: 
(Ò 4A={1,3,5,7,9,...} 
B= {3,9,15,21,....,3(2n-1),...} 
(ii) A={2,4,7, 12, 18, 24} 
B= {1,3,7, 11, 16, 22, 29} 
(iii) A = {x|x1is an even natural number less than 20} 
B= {x |x is natural number less than 20 which is divisible by 2} 
(iv) A= {x|xis an even integer} 
B= {x |x isan integer divisible by 3} 
Prove that A c B and B c C implies 4 CC. 
Prove that A c @ implies A= @. 
If a set A has 101 elements, find the number of subsets of A having odd 
number of elements. 
What are the elements of the power set of the set [1, {2, 3}]? 
IfA={1,2,3,4,5}, B= {2, 4, 6, 8, 10}, C= {3, 6,9, 12, 15}. Find that, 
(i) (AUB)AC (ii) AU(BNC) 
(iii) (ANC)UB 
IfA = {1,2, 3,4}, B= {3, 4,5, 6}, C= {4, 5, 6, 7}. Find that, 
(i) A-B (ii) (AUB)-C 
(iii) A-—(BO ©) (iv) (ANB)-(BUC) 
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Probability and 40. If U= {1, 2,3, 4,5, 6, 7, 8,9, 10} 


Set Theory 
A={1,4,7, 10} 
B={2,5,8} 
Kems Find (i) A’ (ii) B’ (iii) A OB’ (iv) A’ ABMA ARB 


41. Prove that (4 1B) UC=A Q(B UC) ifand only if CCA. 

42. Prove that if A c B then P(A) CP (B). 

43. For any sets A and B, prove or disprove that, 

P(A) A P(B) = P(A AB) 
P(A)U P(B) = P(A VB) 

44. Inasurvey of 100 students, the numbers studying various languages were 
found to be: Spanish 28; German 30; French 42; Spanish and French 10; 
Spanish and German 8; German and French 5; all the three languages 3. 

(i) How many students were studying no language? 
(ii) How many students had French as their only language? 
(iii) How many students studied German ifand only if they studied French? 

45. In each of the following sentences, determine which is a statement (S), or 
not (N): 

(i) Every rectangle is a square. 

(ii) The sum of three angles ofa triangle is 180°. 
(iii) How are you? 

(iv) 2+1=3. 

(v) 2 isarational number. 

46. In a latter survey of the 100 students, the numbers studying the various 
languages were found to be: 

German only 18; German but not Spanish 23; German and French 8; German 
26; French 48; French and Spanish 8; studying no language 24. 

(i) How many students took Spanish? 

(ii) How many took German and Spanish but not French? 

(iii) How many took French if and only if they did not take Spanish? 

47. If A and B are two sets, prove that number of elements in 4 A B” is equal to: 
(Number of elements in A — Number of elements in A A B). 

48. The report of one survey of 100 students stated that the numbers studying 
the various languages were; all three languages 5; German and Spanish 10; 
French and Spanish 8; German and French 20; Spanish 30; German 23; French 
50. The surveyor who turned in this report was fired. Why? 

49. Inarecent survey of 5000 people, it was found that 2800 read Indian Express 
and 2300 read Statesman while 400 read both the papers. How many read 
neither Indian Express nor Statesman? 
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50. 


5I. 


52. 


53. 


54. 


55. 


56. 
ST; 
58. 
59. 
60. 
6l. 


In a survey of 30 students, it was found that 19 take Mathematics, 17 take 
Music, 11 take History, 7 take Mathematics and History, 12 take Mathematics 
and Music, 5 take Music and History and 2 take all three courses. Find 
(i) The number of students that take Mathematics but do not take History 
(ii) The number that take exactly two of the three courses. 


In a Chemistry class there are 20 students, and in a Psychology class there 
are 30 students. Find the number either in a Psychology class or Chemistry 
class if, 


(i) The two classes meet at the same hour. 


(ii) Thetwoclasses meet at different hours and 10 students are enrolled in 
both courses. 


On an Air India flight, there are 9 boys, 5 Indian children, 9 men, 7 foreign 
boys, 14 Indian, 6 Indian males and 7 foreign females. What is the number of 
people in this plane? 

A college awarded 38 medals in Football, 15 in Basket ball and 20 in Cricket. 
If these medals went to a total of 58 men and only three of these men got 
medals in all the three sports, how many men received medals in exactly two 
of the three sports? 


Suppose that in survey concerning the reading habits of students it is found 
that: 


60 per cent read magazine A, 

50 per cent read magazine B, 

50 per cent read magazine C, 

30 per cent read magazines A and B, 

20 per cent read magazines B and C, 

30 per cent read magazines A and C, 

10 per cent read all three magazines. 

(i) Whatper cent read exactly two magazines? 

(ii) What per cent do not read any of the magazines? 


Ina survey of 500 consumers, it was found that 425 liked product A and 375 
liked product B. What is the least number of consumers that must have liked 
both products assuming that there may be consumers of products different 
from A and B. 


Explain how a class is different from a set. 

How many power sets are formed from vowels of English language? 

Give a brief description of basic sets and denotable sets. 

Explain how a fuzzy set is different from a generally defined set. 

Give your comments on ‘operations on fuzzy sets’. Explain three operations. 
What are the areas of application of fuzzy set? Explain with three examples. 
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Structure 


2.0 Introduction 

2.1 Objectives 

2.2 Random Variables of the Discrete Type and Random Variables of the 
Continuous Type 

2.3 Answers to Check Your Progress Questions 


2.4 Summary 
2.5 Key Words 


2.6 Self-Assessment Questions and Exercises 
2.7 Further Readings 


2.0 INTRODUCTION 


Randomness means each possible entity has the same chance of being considered. 
Arandom variable may be qualitative or quantitative in nature. You will study 
probability distribution, which means listing ofall possible outcomes of an experiment 
together with their probabilities. It may be discrete or continuous. 


In this unit, you will study about the random variables of the discrete type 
and random variables of the continuous type. 


2.1 OBJECTIVES 


After going through this unit, you will be able to: 
e Understand the random variables of the discrete type 


e Analyse the random variables of the continuous type 


2.2 RANDOM VARIABLES OF THE DISCRETE 
TYPE AND RANDOM VARIABLES OF THE 
CONTINUOUS TYPE 


Discrete Probability Distributions 
When a random variable x takes discrete values x, x,,....,x, with probabilities 
P p PoP,» We have a discrete probability distribution of X. 
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ee een The function p(x) for which X= x,, x,,..., x, takes values p,, P,,..--P,, iS 
tserere an the probability function of X. 


Continuous Type 
The variable is discrete because it does not assume all values. Its properties 
NOTES oa a 
p(x) = Probability that X assumes the value x. 
= Prob x =x) =p. 
P(x) 20, Xp(x) = 1 
For example, four coins are tossed and the number of heads X noted. X can take 
value 0, 1, 2, 3, 4 heads. 


apal i ale 
p(X= 0) a 


aly ale alk alu alo 


= 
Sl- 


4 1 4 6 4 1 
Ypa) =—+—+—+— +l 
— i616 16 16 l6 


This is a discrete probability distribution. 
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Example 1: Ifa discrete variable X has the following probability function, then, 
find (i) a (ii) p(X 3) (iii) p(X = 3) 


Solution: x pæ) 


A U Ne O 
N 
Q 


5 2a 
Since p(x) = 1 , 0 +a + 2a + 2a? + 4a?+ 2a = 1 
6a’ + 5a — 1 =0, so that (6a — 1) (a+ 1)=0 


a= ora =-—] (not admissible) 


1 5 
Fora = g, p(X <S3)=0 +a + 2a + 2a°= 2a’ +34= 9 


4 
p(X 23) = 4a’? + 2a = 9 
Discrete Distributions 


There are several discrete distributions. Some other discrete distributions are 
described below. 


Uniform or Rectangular Distribution 
Each possible value of the random variable x has the same probability in the uniform 
distribution. Ifx takes vaues x, x,....,x,, then 
Jol 
P (x pP ) k 


The numbers on a die follow the uniform distribution, 
1 
P(x, 6) = A (Here x = 1, 2, 3, 4, 5, 6) 


Bernoulli Trials 


In a Bernoulli experiment, an even £ either happens or does not happen (£’). 
Examples are, getting a head on tossing a coin, getting a six on rolling a die and so 
on. 


The Bernoulli random variable is written, 
X = 1 if E occurs 


=0 if E’ occurs 
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Random Variables of Since there are two possible value it is a case of a discrete variable 
Discrete and 
Continuous Type where, 

Probability of success = p = p(E) 

NOTES Profitability of failure = 1 — p = q = p(E^) 
We can write, 
For k=1, k) =p 
For k = 0, fk) =q 


For k= 0 or 1, Ak) = p‘q'* 
Negative Binomial 


In this distribution the variance is larger than the mean. 


Suppose, the probability of success p in a series of independent Bernoulli 
trials remains constant. 
Suppose the th success occurs after x failures in x +r trials. 
1. The probability of the success of the last trial is p. 
2. The number of remaining trials is x + r— 1 in which there should be r— 
1 successes. The probability of — 1 successes is given by, 


x+r-1 


‘<p q 
The combined pobability of cases (1) and (2) happening together is, 


+r-1 


pœ) = pC, p g x=0, 1, 2... 


This is the Negative Binomial distribution. We can write it in an alternative 
form, 
p) = "Cp" Gq) x=0, 1, 2,... 
This can be summed up as follows: 
In an infinite series of Bernoulli trials the probability that x + r trials will be 
required to get r successes is the Negative Binomial, 
p(x) AF x+r-1 pg r > 0 
Ifr= 1, it becomes the Geometric distribution. 
Ifp > 0, > %, rp = m a constant, then the negative binomial tends to 
the Poisson distribution. 
Geometric Distribution 
Suppose the probability of success p in a series of independent trials remains 
constant. 


Suppose, the first success occurs after x failures, i.e., there are x failures 
preceding the first success. The probability of this event will be given by p(x) = 
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This is the Geometric distribution and can be derived from the Negative 
Binomial. If we put = 1 in the Negative Binomial distribution: 


x+r-1 


p(x) = „ap a 


We get the Geometric distribution, 


PQ) = "O p' q = pq" 


Z p 
Lp(x) = Saq G 
n=0 -q 
E(x) = Mean = = 
q 
Variance = a 
q 
Mode = Gl 
ode = 5 


Example 2: Find the expectation of the number of failures preceding the first 
success in an infinite series of independent trials with constant probability p of 
success. 


Solution: The probability of success in, 
Ist trial = p (Success at once) 

2nd trial = gp (One failure then success and so on) 

3rd trial = q*p (Two failures then success and so 
on) 
The expected number of failures preceding the success, 

E(x) =0. pt. pq + 2p’p t+ wcccccccee 
= pq(1 + 2g + 3g? + wee ) 


sgat 
(l-q)° DP p 


= Pq 


Since p= 1 —q. 
Hypergeometic Distribution 


From a finite population of size N, a sample of size n is drawn without replacement. 
Let there be N, successes out of N. 
The number of failures is N,=N—N, 


The disribution of the random variable X, which is the number of successes 
obtained in the above case, is called the Hypergeometic distribution. 
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N, N 
Discrete and i GC. Cs 
Continuous Type P(x) = NG : (X= 05.15.25 0085 n) 
Here x is the number of successes in the sample and n — x is the number of 
NOTES ; ; 
failures in the sample. 
It can be shown that, 


M :E _ yt 
ean: E(X) = N 


l N-n(nN, nN? 
Variance : Var(X) = Wy | N N 


Example 3: There are 20 lottery tickets with three prizes. Find the probability 
that out of 5 tickets purchased exactly two prizes are won. 


Solution: We have N, = 3,N,=N-—N,=17,x=2,n=5. 


3 C "Co 
PQ) = 6 
7 30 "Co 
The probability fo no pize p(0)= —$ C z 
5 
3 C, en 


The probability of exactly 1 prize p(1) = — T 
5 


Example 4: Examine the nature of the distibution ofr balls are drawn, one at a 
time without replacement, from a bag containing m white and n black balls. 


Solution: It is the hypergeometric distribution. It corresponds to the probability 
that x balls will be white out of 7 balls so drawn and is given by, 


ICS "CLs 
m+n C 


r 


D(x) = 


Multinomial 


There are k possible outcomes of trials, viz., x,, x,, ...,x, with probabilities p, p,, 
-Py n independent trials are performed. The multinomial distibution gives the 
probability that out of these trials, x, occurs n, times, x, occurs n, times and so 


n! 


k 
Where, Yin, =n 
i=l 
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Characteristic Features of the Binomial Distribution 


The following are the characteristics of Binomial distribution: 


ND WN BW NY 


10. 


. Itis a discrete distribution. 

. It gives the probability of x successes and n — x failures in a specific order. 
. The experiment consists ofn repeated trials. 

. Each trial results in a success or a failure. 

. The probability of success remains constant from trial to trial. 

. The trials are independent. 


. The success probability p of any outcome remains constant over time. This 


condition is usually not fully satisfied in situations involving management and 
economics, for example, the probability of response from successive 
informants is not the same. However, it may be assumed that the condition 
is reasonably well satisfied in many cases and that the outcome of one trial 
does not depend on the outcome of another. This condition too, may not be 
fully satisfied in many cases. An investigator may not approach a second 
informant with the same set-up of mind as used for the first informant. 


. The binomial distribution depends on two parameters n and p. Each set of 


different values ofn, p has a different binomial distribution. 


. Ifp=0.5, the distribution is symmetrical. For a symmetrical distribution, inn 


Prob. (X= 0) = Prob (X= n) 
i.e., the probabilities of 0 or n successes in 7 trials will be the same. Similarly, 
Prob (X= 1) = Prob(X =n- 1) and so on. 
Ifp>0.5, the distribution is not symmetrical. The probabilities on the right 
are larger than those on the left. The reverse case is when p < 0.5. 
When n becomes large the distribution becomes bell shaped. Even when n 
is not very large but p =0.5, it is fairly bell shaped. 


The binomial distribution can be approximated by the normal. As n becomes 
large and p is close to 0.5, the approximation becomes better. 


Example 5: If the ratio n/N, i.e., sample size to population size is small, the result 
given by the Binomial may not be reliable. Comment. 


Solution: When the distribution is binomial, each successive trial, being independent 
of other trials, has constant probability of success. Ifhe sampling of 7 items is 
without replacement from a population of size N, the probability of success of any 
event depends upon what happened in the previous events. In this case the Bionomial 
cannot be used unless the ratio n/N is small. Even then there is no guarantee of 
getting accurate results. 


The Binomial should be used only if the ratio 


n 


N is very small, say less that 0.05. 
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seal ne of Example 6: Explain the concept of a discrete probability distribution. 
iscrete an 


Continuous Type Solution: If a random variable x assumes n discrete values x, Xy, ......++ x,, with 
respective probabilities p, Dy... PB, +P, toe + p, = 1) then, the 
NOTES distribution of values x, with probabilities p, (= 1, 2,.....7), is called the discrete 


probability distribution of x. 


The frequency function or frequency distribution of x is defined by p(x) 
which for different values x,, x,, ........*, ofx, gives the corresponding probabilities: 


p(x) = p, where, p(x) 2 0 Zp(x) = 1 
Example 7: For the following probability distribution, find p(x > 4) and 


P(x 24): 
x {O;1}2)]3 14 ] 5 
p(x) |} 0} a | a/2 | a/2 | a/4 | al4 
Solution: 
a aaa 
i p3 = 1,0+a+—+—+—+—=l 
Since, P(x) aS gare 


3 1 2 
=q = — aed 
2 or a 5 


9 
p(x > 4) = pe=5)= 77 
a a a 9%a_ 9 
2 2 4 4 10 
Example 8: A fair coin is tossed 400 times. Find the mean number of heads and 
the corresponding standard deviation. 


p(x $4) =0+a4 


f l 
Solution: This is a case of Binomial distribution with p =q = ane 400 


l 
The mean number of heads is given by u = np = 400 x ra 200 


and S. D. o= upg = 400%» =10 


Example 9: A manager has thought of 4 planning strategies each of which has an 


equal chance of being successful. What is the probability that at least one of his 
Lae, 
oe at 


Solution: The probability that none of the strategies will work is given by, 
Ieee ory 
0=*tc]-|]>]| => 
a (a) G) G) 
3 


4 
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strategies will work if he tries them in 4 situations? Here p = 


_195 
256 


Example 10: Suppose the proportion of people preferring a car C is 0.5. Let X 
denote the number of people out of a set of 3 who prefer C. The probabilities of 
0, 1,2, 3 of them preferring C are, 


Solution: p(X=0) = °C,(0.5)° (0.5) =; 


P(X=1) = °C, (0.5) (0.5) -> 
3 2 1 3 
pX=2) = *C,(0.5)' (0.5) => 
1 
p(x =3) = *€,(0.5)° (0.5) = 38 
A te aes 3 1 
= = Èx. p, =0x—4+1x—4+2°—4+2x—4+3x—=1.5 
t= En Bl a gg 
© =E(X—p)y = E(X’)- p =2x p -p 
= 0° Li ae E : 15° 
8 8 8 8 
= 0.75 
Example 11: For the Poisson distribution, write the probabilities of 0, 1, 2, .... 
successes. 
Solution: 
x] pase" 
x! 
0 p(0)=e" m°/0! 
-m M 
1 p(l) =e The p(0).m 


2 
m 


m 
2| e™— -= p(2) = p(1).— 
>j P(2) PY) 
3 


3 | e” Z= pO) = pO). 


m 
3 


and so on. 
Total ofall probabilities Lp(x) = 1 
Example 12: What are the raw moments of Poisson distribution? 


Solution: First raw moment p’, =m 
Second raw moment p’, =m? + m 
Third raw moment u’, = m° + 3m? +m 
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Example 13: For a Poisson distribution p(0) = p(1), find p(x > 0). 
Solution: We have e” m°/0! =e” m°/1! so that m= 1 

<. p(0) =e” = 1/2.718 = 0.37 

p(x > 0) = 1 — p(0) = 0.63 


Example 14: It is claimed that the TV branded M is demanded by 70% of 
customers. If Xis the number of TVs demanded, find the probability distribution 
of X when there are four customers. 


Solution: If all the four demand M then p(4) = 0.74 = 0.2401. The probability 
distribution is 

X 4 3 2 1 0 

p 0.2401 0.4116 0.2646 0.0756 0.0081 
These values may be plotted. 


Note: Poisson Approximation to the Binomial. When the number of trials n 
is large and the probability p of a success is small, the binomial distribution can be 
approximated by the Poisson. This approximation is useful is practice. 


If p = 0.002, n = 1000, m = np = 2 
The probability of 2 successes in 1000 trials is, 
p(X = 2) = "C.p*q"* ='°C,(0.002)" (0.998) 


Similarly, P(X= 3) = C, (0.002) (0.998) , ete. 


These terms are difficult to calculate. If we employ the Poisson, we have a 
much easier task before us. 


m= np = 100 x 0.002 = 2 


em e” 2? 


P(X= 2) = = 0.1353x 20.2706 
xX! 
e"m e°? _ 9.1804 
RIET T 


Example 15: One in every 100 items made on a machine is defective. Out of 25 
items picked, find the probability of 1 item being defective. 


p=9.01, q = 0.99, n = 25, np = 0.25 
Binomial : p(1) = *C,(0.1)' (0.99)* = 0.1964 


-25 (0.25)! 
Poisson : p(1) = a = 0.1947 


Continuous Probability Distributions 


When a random variate can take any value in the given interval a <x <b, itis a 
continuous variate and its distribution 1s a Continuous Probability Distribution. 


Theoretical distributions are often continuous. They are useful in practice because 
they are convenient to handle mathematically. They can serve as good 
approximations to discrete distributions. 


The range of the variate may be finite or infinite. 
A continuous random variable can take all values in a given interval. 
A continuous probability distribution is represented by a smooth curve. 


The total area under the curve for a probability distribution is necessarily 
unity. The curve is always above the x axis because the area under the curve for 
any interval represents probability and probabilities cannot be negative. 


If Xis a continous variable, the probability of X falling in an interval with end 
points z,,z, may be written p(z, < XSz,). 


This probability corresponds to the shaded area under the curve. 


Z, Z 


A function is a probability density function if, 


[pea = 1, p(x)20,—% < x < ©, i.e., the area under the curve p(x) is 


‘1’ and the probability of x lying between two values a, b, i.e., p(a < x < b) is 
positive. The most prominent example ofa continuous probability function is the 
normal distribution. 


Cumulative Probability Function (CPF) 

The Cumulative probability function (CPF) shows the probability that x takes a 
value less than or equal to, say, z and corresponds to the area under the curve up 
toz: 

pa <z)=| pæd 

This is denoted by F(x). 


Check Your Progress 
1. Explain the moment generating function with reference to probability theory. 
2. Explain discrete probability distribution. 
3. Whatis a hypergeometric distribution? 
4. Explain continuous probability distribution. 
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Random Variables of 


Discrete and 2.3 ANSWERS TO CHECK YOUR PROGRESS 
Continuous Type 
ee QUESTIONS 
NOTES 1. According to probability theory, moment generating function generates the 
moments for the probability distribution of a random variable X, and can be 
defined as: 


M(t) =E(e%), ter 


2. When a random variable x takes discrete values x, x,,.....x, with probabilities 
P Pp, we have a discrete probability distribution of X. 


3. The disribution of the random variable X which is the number of successes 
obtained is called the hypergeometic distribution. 


4. When a random variate can take any value in the given interval a <x < b, it 
is a continuous variate and its distribution is a Continuous Probability 
Distribution. 


2.4 SUMMARY 


e When a random variable x takes discrete values x,, x,,...., x, with 
probabilities p}, p,,....P,, we have a discrete probability distribution of X. 


e Each possible value of the random variable x has the same probability in 
the uniform distribution. Ifx takes vaues x, x,....,%,, then 


1 
por, = 


e Ina Bernoulli experiment, an even E either happens or does not happen 
(E’). 

e Suppose, the probability of success p in a series of independent Bernoulli 
trials remains constant. 


e Suppose, the first success occurs after x failures, i.e., there are x failures 
preceding the first success. The probability of this event will be given by 
P(x) = gp (x = 9, 1, 2......) 

e The multinomial distibution gives the probability that out of these n trials, x,,, 
occurs n, times, x, occurs n, times and so on. This is given by 


e When a random variate can take any value in the given interval a <x < b, it 
is a continuous variate and its distribution is a Continuous Probability 
Distribution. 
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e The Cumulative probability function (CPF) shows the probability that x 
takes a value less than or equal to, say, z and corresponds to the area under 
the curve up to z: 


pa <z)=| p(x)dr 


2.5 KEY WORDS 


e Hypergeometic distribution: The disribution of the random variable X, 
which is the number of successes obtained in the above case, is called the 
Hypergeometic distribution. 


e Random variablele: It is a variable which takes on different values as a 
result of the outcomes of a random experiment. It can be either discrete or 
continuous. 


2.6 SELF-ASSESSMENT QUESTIONS AND 
EXERCISES 


Short-Answer Questions 


1. Whyis random variable considered important in statistics? 
2. Explain the techniques of assigning probability. 

3. What is moment generating function? 

4. Explain briefly the probability distribution and its types. 


Long-Answer Questions 


1. Differentiate between a discrete and a continuous variable. 

2. A continuous variable is an uninterrupted motion like the fall of a rain drop. 
Comment on it. 

3. There are 3 tickets numbered 0, 2, 3. One ticket is selected and replaced; 
another ticket is selected and replaced. A third ticket is selected and replaced 
once again. If X stands for the sum of the 3 ticket numbers, construct the 
probability distribution of Xand find m and o°. 


4. Acar insurance policy offers Rs. 1000 after an accident the probability of 
whose occurrence 1s 0.04. If the expected gain is to be zero what should be 


the premium? 
5. For the probability distribution, 
x -1 ee 3 7.5 8 
p 0.2 0.15 0.3 0.1 0.05 
E(x) = 2.15 V(x) = 9.5 
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6. Ina queue the probability of the number of people joining per minute given 


below. 
Number of persons 0 1 2 3 
Joining the queue 
Probability 0.4 0.3 0.2 0.1 


7. From an urn containing 20 black and 30 red balls, 6 balls are drawn at 
random. Find the probability that no red ball is selected. 


8. Examine the nature of the distribution ofr balls are drawn, one at a time, 
without replacement, from a bag containing m white and n black balls. 


9. 20 per cent of bolts in a factory are defective. Deduce the probability 
distribution of the number of defectives in a sample of 5. 
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3.0 INTRODUCTION 


An expected value is the sum of each possible outcome and the probability of 
occurrence of outcome. Expectation may be conditional or iterated. You will study 
the moment generating function (MGF), which generates the moments for the 
probability distribution ofa random variable. The subject of probability in itself is 
a cumbersome one, hence, only the basic concepts will be discussed here. 


Since the outcomes of most decisions cannot be accurately predicted because 
of the impact of many uncontrollable and unpredictable variables, it is necessary 
that all the known risks be scientifically evaluated. Probability theory, sometimes 
referred to as the science of uncertainty, is very helpful in such evaluations. It helps 
the decision-maker with only limited information to analyse the risks and select the 
strategy of minimum risk. 


In this unit, you will study about the properties of the distribution function, 
expectation of random variable, Chebyshev’s inequality. 


3.1 OBJECTIVES 


After going through this unit, you will be able to: 
e Describe the properties of the distribution function 
e Analyse the expectation of random variable 


e Understand about the Chebyshev’s inequality 
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3.2 PROPERTIES OF THE DISTRIBUTION FUNCTION 


Distribution function can be related to any random variable which refers to the 
function that assigns a probability to each number in an organized and well-arranged 
method such that the value of the random variable is equal to or less than the given 
number. 


The distribution function is also known as cumulative frequency distribution 
or cumulative distribution function. Fundamentally, it defines the probability that is 
the value related to the variable X tend to be equal to or less than number ‘x’. 


Probability distributions indicate the likelihood ofan event or outcome, i.e., 
a probability distribution is a function that describes the likelihood of obtaining the 
possible values that a random variable can assume. In other words, the values of 
the variable vary based on the underlying probability distribution. This type of 
distribution is very useful when you want to know which outcomes are most likely, 
the spread of potential values, and the likelihood of different results. The sum ofall 
probabilities for all possible values must equal 1. Furthermore, the probability for 
a particular value or range of values must be between 0 and 1. Probability 
distributions, therefore, describe the dispersion of the values of arandom variable. 


Arandom variable is a real valued function from the probability space. We 
can compute the probability that a random variable takes values in an interval by 
subtracting the distribution function evaluated at the endpoints of the intervals. 


3.2.1 Function of a Random Variable 


A random variable is a variable that takes on different values as a result of 
the outcomes of a random experiment. In other words, a function which assigns 

numerical values to each element of the set of events that may occur (i.e., every 

element in the sample space) is termed a random variable. The value of a random 

variable is the general outcome of the random experiment. One should always 

make a distinction between the random variable and the values that it can take on. 

All this can be illustrated by a few examples shown in the Table 3.1. 

Table 3.1 Random Variable 


Random Variable Values of the Description of the Values of 
Random Variable the Random Variable 
X 0, 1,2,3,4 Possible number of heads 
in four tosses of a fair coin 
Y 1,2,3, 4,5,6 Possible outcomes in a 
single throw of a die 
Z 2, 3,4, 5, 6, 7, 8, 9, 10, 11, 12 Possible outcomes from 
throwing a pair of dice 
M O16 27-8 Ss 5 ace aes S Possible sales of 


newspapers by a 
newspaper boy, 
S representing his stock 


All these above stated random variable assignments cover every possible Expectation of 
Random Variables 


outcome and each numerical value represents a unique set of outcomes. A random 
variable can be either discrete or continuous. If a random variable is allowed 
to take on only a limited number of values, it is a discrete random variable but if it 
is allowed to assume any value within a given range, it is a continuous random 
variable. Random variables presented in the above table are examples of discrete 
random variables. We can have continuous random variables if they can take on 
any value within a range of values, for example, within 2 and 5, in that case we 
write the values of a random variable x as follows: 
Six SD 


Techniques of Assigning Probabilities 


We can assign probability values to the random variables. Since the assignment of 
probabilities is not an easy task, we should observe certain rules in this context as 
given below: 


(i) A probability cannot be less than zero or greater than one, i.e., 0 <pr<1, 
where pr represents probability. 


(ii) The sum ofall the probabilities assigned to each value of the random variable 
must be exactly one. 


There are three techniques of assignment of probabilities to the values of the random 
variable: 


(a) Subjective Probability Assignment. It is the technique of assigning 
probabilities on the basis of personal judgement. Such assignment may differ 
from individual to individual and depends upon the expertise of the person 
assigning the probabilities. It cannot be termed as a rational way of assigning 
probabilities but is used when the objective methods cannot be used for 
one reason or the other. 


(b) A-Priori Probability Assignment. It is the technique under which the 
probability is assigned by calculating the ratio of the number of ways in 
which a given outcome can occur to the total number of possible outcomes. 
The basic underlying assumption in using this procedure is that every possible 
outcome is likely to occur equally. But at times the use of this technique 
gives ridiculous conclusions. For example, we have to assign probability to 
the event that a person of age 35 will live upto age 36. There are two 
possible outcomes, he lives or he dies. If the probability assigned in 
accordance with a-priori probability assignment is half then the same may 
not represent reality. In such a situation, probability can be assigned by 
some other techniques. 


(c) Empirical Probability Assignment. It is an objective method of assigning 
probabilities and is used by the decision-makers. Using this technique the 
probability is assigned by calculating the relative frequency of occurrence 
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of a given event over an infinite number of occurrences. However, in practice 
only a finite (perhaps very large) number of cases are observed and relative 
frequency of the event is calculated. The probability assignment through 
this technique may as well be unrealistic, if future conditions do not happen 
to be areflection of the past. 


Thus, what constitutes the ‘best’ method of probability assignment can only 
be judged in the light of what seems best to depict reality. It depends upon 
the nature of the problem and also on the circumstances under which the 
problem is being studied. 


Variance and Standard Deviation of Random Variable 


The mean or the expected value of random variable may not be adequate 
enough at times to study the problem as to how random variable actually 
behaves and we may as well be interested in knowing something about how the 
values of random variable are dispersed about the mean. In other words, we 
want to measure the dispersion of random variable (X) about its expected 
value, i.e., E(X). The variance and the standard deviation provide measures of 
this dispersion. 

The variance of random variable is defined as the sum of the squared 
deviations of the values of random variable from the expected value weighted 
by their probability. Mathematically, we can write it as follows: 


Var(X)=0% -5[x; ~E(X)] -pr(X;) 


Alternatively, it can also be written as, 
Var(X)=0% =} X; pr.(X,)-[E(X)] 
Where, F (X) is the expected value of random variable. 
Xis the ith value of random variable. 


pr. (X,) is the probability of the ith value. 


The standard deviation of random variable is the square root of the 
variance of random variable and is denoted as, 


The variance of a constant time random variable is the constant squared 
times the variance of random variable. This can be symbolically written as, 
Var (cX ) = c? Var (X) 


The variance of a sum of independent random variables equals the sum 
of the variances. 


Thus Expectation of 
í Random Variables 
Var (X + Y + Z ) = Var(X) + Var(Y) + Var(Z) 
If X, Y and Z are independent of each other. 


The following examples will illustrate the method of calculation of these NOTES 
measures of a random variable. 


Example 1: Calculate the mean, the variance and the standard deviation for 
random variable sales from the following information provided by a sales manager 
of a certain business unit for a new product: 


Monthly Sales (in units) Probability 
50 0.10 
100 0.30 
150 0.30 
200 0.15 
250 0.10 
300 0.05 


Solution: 


The given information may be developed as shown in the following table for 
calculating mean, variance and the standard deviation for random variable sales: 


Monthly Sales Probability (X) pr (X) (X; -E 00) X,-E (x) 
(in units)' X, pr (X) pr (X) 
X% 50 0.10 5.00 (50 — 150)? 1000.00 
= 10000 
x, 100 0.30 30.00 (100 — 150)? 750.00 
= 2500 
x, 150 0.30 45.00 (150 — 150)? 0.00 
=0 
X, 200 0.15 30.00 (200 — 150)? 375.00 
= 2500 
X; 250 0.10 25.00 (250 — 150)? 1000.00 
= 10000 
X 300 0.5 15.00 (300 — 150)? 1125.00 
= 22500 
EX) pr (X) LX,- E X] 
= 150.00 pr (X;) = 4250.00 


Mean of random variable sales = X 
or, E (X) = X(X).pr(X) = 150 


Variance of random variable sales, 
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n 


Or, o% =F (X,-E(X)) .pr(X,) = 4250 
iml 


Standard deviation of random variable sales, 


Or, Oy = o} = y4250 = 65.2 approx. 


The mean value calculated above indicates that in the long run the average sales 
will be 150 units per month. The variance and the standard deviations measure 
the variation or dispersion of random variable values about the mean or the 
expected value of random variable. 


Example 2: Given are the mean values of four different random variables viz., 
A, B, C, and D. 


A=20, B=40, C=10, D=5 
Find the mean value of the random variable (4 + B + C + D) 
Solution: 
E(A+B+C+D) 


ll 


E(A) + E(B) + E(C) + E(D) 
= A+B+C+D 
20+ 40+ 10+ 5 
= 75 
Hence, the mean value of random variable (4 + B + C + D) is 75. 


Example 3: If X represents the number of heads that appear when one coin 
is tossed and Y the number of heads that appear when two coins are tossed, 
compare the variances of the random variables X and Y. The probability 
distributions of X and Y are as follows: 


X; pr (X) Y, pr (Y) 
1 1/2 
1 1/2 2 1⁄4 
Total = 1 Total = 1 
Solution: 
2 2 
1 1 1 1 
i =0 =|0-=] |=]+]1-=]. 
Variance of X =o; { z) (5) 3 (+) 
= PA 
8 8 4 
à 2: 1 2 1 
Variance of Y =o; = (0-1) (;)+¢-0 ( ) 


The variance of the number of heads for two coins is double the variance Expectation of 
; Random Variables 
of the number of heads for one coin. 


3.2.2 Moment Generating Functions 


According to probability theory, moment generating function generates the moments NOTES 


for the probability distribution ofa random variable X, and can be defined as: 


M(t) =E (e°), teR 


When the moment generating function exists with an interval ¢ = 0, the nth moment 
becomes, 


E(X") = M,” (0) = [d M 0) / dt] _, 


The moment generating function, for probability distribution condition being 
continuous or not, can also be given by Riemann-Stieltjes integral, 


My(t)= f e*aF(x) 


Where F is the cumulative distribution function. 


The probability density function f(x), for X having continuous moment generating 
function becomes, 


M, (=| e” f(x) dx 
=f" + etPx?/21 +...) f(x) de 


Note: The moment generating function of X always exists, when the exponential 
function is positive and is either a real number or a positive infinity. 


1. Prove that when X shows a discrete distribution having density function f, 
then, 


M O= AES) 


xeS 


2. When Xis continuous with density function f, then, 


M, (0) = fe" fds 


S 


3. Consider that X and Y are independent. Show that, 
M, (=M LAM 
3.2.3 Probability Density Functions—Discrete and Continuous 


Probability density may be defined as a measure ofexistence. Logic can be analysed 
with the help ofa graph. In graphical representation, a probability density functions 
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gives information about the existence of samples at various locations and the entire 
graph area can be considered as a sample space. 


A mathematical description of samples in regions can be given under a 
coordinate system. The regions and the existence of samples at various locations 
can be described in this coordinate system by mathematical functions. For example, 
to locate a point in a sample space it is expressed as a pair of (x,y) coordinates 
and a mathematical function describes a location for each sample. 


The function that shows the location of each sample is called a ‘density 
function’. By showing locations of each sample, density of the samples can be 
understood. If we assume that the odds of selecting any one sample is the same as 
for any other, then the PDF (probability density function), p(x,y), gives the 
probability ofa point in sample space to lie between two given limits of the variables. 


Thus, in case ofa discrete variable x, the probability function, PDF(x), is the 
probability of occurrence of x, variable of continuous nature, PDF(x) shows 
probability density ofa variable x. Thus, probability ofa value between x and x+dx 
is PDF(x) x dx. 

A ‘cumulative density function’, CDF(x), shows the probability that the 
said variable assumes a value <= x. 


The mean or the average value is available in majority of cases. This MEAN 
is taken as sum of products x x PDF(x). In case of continuous variables, MEAN 
is given by an integral of X x PDF(x), integrated in the range. 

Let fbe a non-negative fucntion mapped R — R then the probability distribu- 
tion has density f probability in the interval [a, b] is given as, 


[ f(x)dx in case of any two numbers a and b 


Total integral of fmust be 1. It converse is also true which tells that a function 
with total integral is equal to 1 then, frepresents the probability density for some 
probability distribution. 


A probability density function is in fact, a refined version ofa histogram with 
very small or infinitesimal interval. Due to this, the curve is smooth. If sampling is 
done, taking many values of a random variable which is of continuous nature, a 
histogram is produced that depicts a histogram showing probability density, in a 
very narrow output range. This can be termed a probability density function if and 
only if it is non-negative and the area under the graph is 1. Putting mathematically 
with logical connective showing ‘conjunction’, it is given as: 


f(x) z0x f f(x)de=1 


All distributions are not showing density function. It is said to have a density 
function f(x) only if its CDF, denoted as F(x), is continuous. 


This is expressed mathematically as: Expectation of 
Random Variables 


“F=f 


Link between Discrete and Continuous Distributions 


A PDF describes association of variable in a distribution that is continuous: taking 
a set of two state variable in the interval [a, b]. 


Discrete random variables may be represented probability distribution using 
a delta function. If we consider a binary discrete random variable by taking two 
distinct value, say 1 which are equally likely, probability density of such a variable 
is given by: 


Jos Zeu +1)+6(¢-D) 


We may generalize this as follows: Ifa discrete variable assumes ‘n’ different 
values in the set of real numbers, then the associated probability density function is 
given by: 


f= POU-%) 


Where i = 1, 2,3....., n, and Xf series, stand for discrete values for the 
variables and SP (where i= 1,2, ...., n) are probabilities associated with these 
values. 


The method is used for knowing the characteristics the mean, its variance 
and kurtosis. 


The method is used to show mathematically the characteristic of Brownian 
movement and deciding on its initial configuration. 


Probability Functions Associated with Multiple Variables 


In case of random variables, of continuous nature, SX,, where i= 1, 2,3, 4, ....., 
n, one can define a PDF for the whole set. This is a term coined as joint PDF 
which is defined as a function with n variables. This is known by a different name 
MDF which means marginal density function. 


Independence 


Random variables X, ...,..., X, of continuous nature are independent of each 
other if and only if fX,, ..... X, Œp -0 xX) =/X|(%,) - X (x,). 
Corollary 


Ifa JPDF (joint probability distribution function) of a vector ofn random variables 
shown as a product ofn functions of one variable fX, ....,X, (%,, X) =A,(%,) 
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-f (x), then all these variables are independent of each other. MPDF (marginal 
probability density function) for each is expressed as: 


fi(%;) 


JX œx) = [Ld 


For example, this illustrates the definition ofa MDPF (multidimensional probability 
density functions) as stated above. This condition is considered for function with a 
set of two variables. This condition is considered in case ofa function ofa set of 


two variables. Let us call R a2-dimensional random vector of coordinates (X, Y): 
the probability to obtain R in the quarter plane of positive x and y is 


P(X >0,Y >0)=f | Æ Y@,y)drdy 


Sums of Independent Random Variables 


Let there be two random variables, u and v. Here each of which has a PDF; then 
the sum of these two PDFs, is taken as convolution of these two and is the 
convolution of their separate density functions. This is given mathematically, as 
below: 


fut v(x)=[" fuly) fr (= y) dy 


Check Your Progress 


. Define variance of random variable. 

. Explain discrete and continuous random variables. 

. Define variance of random variable. 

. Explain the moment generating function with reference to probability theory. 
. What is a density function? 


NH Wn BW NY 


. What is meant by independence ofa random variable? 


3.3 EXPECTATION OF RANDOM VARIABLE 


The expected value (or mean) of X is the weighted average of the possible 
values that X can take. Here X is a discrete random variable and each value is 
being weighted according to the probability of the possibility of the occurrence of 
the event. The expected value of Xis usually written as E(X) or u. 


E(X) = =x P(X=x) 


Hence, the expected value is the sum of: 


Each of the possible outcomes + The probability of the outcome occurring Expectation of 
Random Variables 

Therefore, the expectation is the outcome you expect of an experiment. 

Let us consider the following example, 

What is the expected value when we roll a fair die? NOTES 


There are six possible outcomes 1, 2, 3, 4, 5, 6. Each one of these has a probability 
of 1/6 of occurring. Let X be the outcome of the experiment. 


Then, 
P(X=1) = 1/6 (this shows that the probability that the outcome of the 
experiment is 1 is 1/6) 
P(X= 2) = 1/6 (the probability that you throw a 2 is 1/6) 
P(X= 3) = 1/6 (the probability that you throw a 3 is 1/6) 
P(X= 4) = 1/6 (the probability that you throw a 4 is 1/6) 
P(X=5) = 1/6 (the probability that you throw a 5 is 1/6) 
P(X= 6) = 1/6 (the probability that you throw a 6 is 1/6) 
E(X) = 1xP(X = 1) + 2xP(X = 2) + 3xP(X = 3) + 4xP(X=4) + 
5xP(X=5) + 6 x P (X=6) 
Therefore, 


E(X) = 1/6 + 2/6 + 3/6 + 4/6 + 5/6 + 6/6 = 7/2 or 3.5 


Hence, the expectation is 3.5, which is also the halfway between the possible 
values the die can take, and so this is what you should have expected. 


Expected Value of a Function of X 
To find E[ f(X) ], where AX) is a function of X, we use the following formula: 


El AX) ] = EVP = x) 
Let us consider the above example of die, and calculate E(X’) 
Using the notation above, f(x) =x? 
SQ) = 1, f(2) = 4, 43) = 9, K4) = 16, A5) = 25, (6) = 36 
P(X= 1) = 1/6, P(X = 2) = 1/6, ete. 
Hence, E(X?) = 1/6 + 4/6 + 9/6 + 16/6 + 25/6 + 36/6 = 91/6 = 15.167 


The expected value ofa constant is just the constant, as for example £(1)= 1. 
Multiplying a random variable by a constant multiplies the expected value by that 
constant. 


Therefore, E[2X] = 2ELX] 
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An important formula, where a and b are constants, is: 
ElaX + b] = aE[X]+ b 

Hence, we can say that the expectation is a linear operator. 

Variance 


The variance of a random variable tells us something about the spread of the 
possible values of the variable. For a discrete random variable X, the variance of 
Xis written as Var(X). 


Var(X) = E[(X— p) 
Where, u is the expected value E(X) 
This can also be written as: 
Var(X) = EX) — p? 
The standard deviation of Xis the square root of Var(X). 


Note: The variance does not behave in the same way as expectation, when we 
multiply and add constants to random variables. 


Var[aX + b] = a’Var(X) 
Because, Var[aX + b] = E[ (aX + bY ] - (E [aX + bJ? 
= E[ œX + 2abX + b?] — (aE(X) + bY 
= E(X’) + 2abE(X) + b? — @ E(X) — 2abE(X) — b? 
= P E(X) — a’ E*(X) = a’ Var(X) 


3.4 CHEBYSHEV’S INEQUALITY 


The Chebyshev polynomials {T (x)} are orthogonal on (—1, 1) with respect to 
the weight function w(x) = (1 aay 2. The Chebyshev polynomial is defined by 
the following relation: 


For x € [-1, 1], define 
2 -1 
T(x) = cos (n cos (x)) for each n 20. 
It is not obvious from this definition that T(x) is an n” degree polynomial in 
x, but now it will be proved that it is. First note that 
T(x) =cos0=1 and T Œ) = cos (cos '(x)) =x, 


For n > 1, introduce the substitution @ = cos ‘x to change this 
equation to 


T(x) = T,,(0(x)) = 7, (8) = cos(n8), where 0 €|0, 7]. 


A recurrence relation is derived by noting that: Expectation of 


Random Variables 


T,, + (8) = cos(nO + 0) = cos(n8) cos O — sin(n8) sin O 


And T,, _|(9) = cos(nO — 8) = cos(n8) cos @ + sin(8) sin 0. NOTES 


Adding these equations gives the following: 
Ta +1(8) +T, (8) = 2 cos(nO) cos 0. 
Returning to the variable x and solving for T, , ;(x) you have, for eachn 2 1, 
T, (8) =2cos(n arccos x).x- T, _ (x) = 27, (x) x-T,,_; (x). 
Since y(x) and 7;(x) are both polynomials in x, 7; .1(*) will be a 


polynomial in x for each n. 
[Chebyshev Polynomials] 7o(x) =1, 7,(x)= x, 


and, forn 21, T, ,1(%) is the polynomial of degree n + 1 given by 
Ta +1) E 2xT,, (x) = Ta _1(). 
The recurrence relation implies that T (x) is a polynomial of degree n, and 


it has leading coefficient 2”~ l when n> 1. The next three Chebyshev polynomials 
therefore are as follows: 


T(x) = 2x7,(x) —Ty(x) = 2x" -1. 
B(x) = 2xT,(x)— T (x) = 4x7 -3x. 
And T(x) = 2xT(x) — Ty (x) = 8x" -8x +1. 


The graphs of 7,, T>, T}, and T} are shown in Figure 3.1. Notice that each 
of the graphs is symmetric to either the origin or the y-axis, and that each assumes 
a maximum value of 1 and a minimum value of—1 on the interval [—1, 1]. 


TX) 


Fig. 3.1 Graph of Chebyshev Polynomials 
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The first few Chebyshev polynomials are summarized as follows: 


R(x) =1 
T(x) =x 
D(x) =2 51 


D(x) = 4x? — 3x 

T(x) = 8x*- 8x7 +1 

Ts(x) = 16x? — 20x? + 5x 
T(x) = 32x° — 48xf + 18x? — 1 


Note: The coefficients of x” in T,(x) is always 2” ~ k 


By using the Chebyshev polynomials, you can express 1, x, x”, ... xê as 


follows: 
1 = (x) 
x = T(x) 
P= 1h) + THO) 
si 
x = 7 BAO) + BO] 
xi = [3% (x) + 47) (x) + T4 (x)] 
x = [OF (x) +5750) + F500) 
and x° = SORG) +157, (x) + 6T4 (x) + T0] 


These expressions are useful in the economization of power series. 
Thus, after omitting 7;(x) you have, sinx — 0.8802 7; (x) — 0.03906 T} (x) 
substituting T, (x)= x and 7;(x) = 4x? —3x youhave, 
sinx = 0.9974x — 0.1562x° 


which gives sin x correctly upto three significant digits with only two terms for any 
value of x. 


Chebychev Inequality 


If X is a random variable having u as expected value and o° as finite variance, 
then for a positive real number k 
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1 
Pr (|X — p| 2 ket) < 77. 


Only when, k> 1, we can get useful information. This is equivalent to 
NOTES 


2 


o 
Pr (¥-y| 2 0) S a. 


For example, when k = V2, at least half of these values lie in the open 
interval (u — Vo, ou + 20). 

The theorem provides loose bounds. However, bounds provided as per 
Chebychev’s inequality cannot be improved upon. For example, when k> 1, 
following example having o= 1/k, meets such bounds exactly. 


Pr(X =- 1)=1/(2h), 
P(X = 0)= 1-2, 
Pr(X = 1)=1/2k), 
Pr(|X — | > ko) = 1/F, 
Equality holds exactly in case ofa linear transformation whereas inequality 
holds for distribution which is non-linear transformation. 
Theorem is useful as it applies to random variables for any distribution and 
these bounds can be computed by knowing only mean and variance. 


Any observation, howsoever accurate it may be, is never more than few 
standard deviations away from the mean. Chebyshev’s inequality gives following 
bounds that apply to all distributions in which it is possible to define standard 
deviation. 

At least: 

e 50% of these values lie within standard deviations = V2 
e 75% of the values lie within standard deviations = 2 

e 89% lie within standard deviations = 3 

e 94% lie within standard deviations = 4 

e 96% lie within standard deviations = 5 

e 97% lie within standard deviations = 6 


Standard deviations are always taken from the mean. 


Generally: 
Minimum (1 — 1/4) x 100% lie with standard deviations = k. 
3 5 
Example 4: Represent sin x= x — a + a ... by using Chebyshev polynomial 


to obtain three significant digits accuracy in the computation of sin x. 
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Solution: Using Chebyshev polynomial, sin x is approximated as: 
1 1 
snx = T(x) -—[37,(x) + T + — [L07 (x) + 5 (x) + T 
1) or 1(x) + B(x)] TIA 1) 3(x) + T(x)] 


~ 0.8802 T, (x) — 0.03906 T; (x) + 0.00052 T; (x) 


As the coefficient of 7;(x) is 0.00052 and as | 75(x) |< 1 for all x, therefore 
the truncation error if we omit the last term above still gives three significant digit 
accuracy. 

Example 5: Express the Taylor series expansion of 


2 3 4 
e*~ =1-x+ PE ee 
2! 3! 4! 


in terms of Chebyshev polynomials. 


Solution: The Chebyshev polynomial representation of 


2 3 
Ms X 
E oes e4 

21 Bi 


ase * = Ty(x)-T(x)+ - [D + T(x)]- BAO +R 


m BD) +47 (x) + T(x)]- = LOT (x) + 55 (x) + T5(x)] +... 


Thus, 
e™ = 1.26606 Ty (x) — 1.13021 7, (x) + 0.27148 T, (x) 
—0.04427 T} (x) + 0.0054687 T4 (x) —0.0005208 T; (x) +... 


Now, if you expand T(x), 7; (x), D(x), B(x), T4(x) and T; (x) using 

their polynomial equivalents and truncate after six terms, than you have, 

e™ = 1.00045 — 1.000022x + 0.4991992x" — 0.166488x° 

+ 0.043794x* — 0.008687x° 
Comparing this representation with the Taylor series representation, you 
observe that there is a slight difference in the coefficients of different powers of x. 
The main advantage of this representation as a sum of Chebyshev polynomials is 
that, for a given error bound, you can truncate the series with a smaller number of 
terms compared to the Taylor series. Also, the error is more uniformly distributed 
for various arguments. The possibility ofa series with a lower number of terms is 
called economization of power series. The maximum error in the six terms of 
Chebyshev representation of e ~ is 0.00045 whereas the error in the six terms of 
Taylor series representation of e™ is 0.0014. Thus, you have to add one more 


term in Taylor series to ensure that the error is less than that in the Chebyshev Expectation of 
n : Random Variables 
approximation. 


Check Your Progress NOTES 


7. What is an expected value? 
8. Define variance with respect to discrete random variable. 


9. Summarize the first few Chebyshev polynomials. 


3.5 ANSWERS TO CHECK YOUR PROGRESS 
QUESTIONS 


1. Arandom variable is a variable that takes on different values as a result of the 
outcomes of a random experiment. 


2. A random variable can be either discrete or continuous. Ifa random variable 
is allowed to take on only a limited number of values, it is a discrete random 
variable but if it is allowed to assume any value within a given range, it is a 
continuous random variable. 


(7) A probability cannot be less than zero or greater than one, i.e., 0 < pr 
< 1, where, pr represents probability. 


(ii) The sum ofall the probabilities assigned to each value of the random 
variable must be exactly one. 


3. The variance of random variable is defined as the sum of the squared deviations 
of the values of random variable from the expected value weighted by their 
probability. 

4. According to probability theory, moment generating function generates the 
moments for the probability distribution of a random variable_X, and can be 
defined as: 


M(t) =E (e), ter 

5. The function that shows the location of each sample is called a ‘density 
function’. By showing locations of each sample, density of the samples can 
be understood. 


6. Random variables X, ...... , X, of continuous nature are independent of 
each other if and only if fX,, ....,X, Œp 5 X) =/X,(x,) * > AX). 

7. The expected value (or mean) of Xis the weighted average of the possible 
values that X can take. Here Xis a discrete random variable and each value 
is being weighted according to the probability of the possibility of the 
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occurrence of the event. The expected value of Xis usually written as E(X) 
or LL. 


. The variance ofa random variable tells us something about the spread of the 


possible values of the variable. For a discrete random variable X, the variance 
of Xis written as Var(X). 


Var(X) = E[(X— py") 
Where, u is the expected value E(X). 


. The first few Chebyshev polynomials can be summarized as follows: 


hx) =1 
T(x) =x 


3.6 SUMMARY 


e A function which assigns numerical values to each element of the set of 


events that may occur (i.e., every element in the sample space) is termed a 
random variable. 


Ifa random variable is allowed to take on only a limited number of values, 
it is a discrete random variable but if it is allowed to assume any value within 
a given range, it is a continuous random variable. 


The sum ofall the probabilities assigned to each value of the random variable 
must be exactly one. 


e we want to measure the dispersion of random variable (X) about its expected 


value, i.e., E(X). The variance and the standard deviation provide measures 
of this dispersion. 


e The variance of random variable is defined as the sum of the squared 


deviations of the values of random variable from the expected value weighted 
by their probability. 


e According to probability theory, moment generating function generates the 


moments for the probability distribution of a random variable X, and can be 
defined as: 


Mv =E(e"), ter 


In graphical representation, a probability density functions gives information 
about the existence of samples at various locations and the entire graph 
area can be considered as a sample space. 


e The mean or the average value is available in majority of cases. This MEAN Expectation of 
Random Variables 


is taken as sum of products x x PDF(x). In case of continuous variables, 
MEAN is given by an integral of X x PDF(x), integrated in the range. 


e Ifa JPDF (joint probability distribution function) of a vector of n random NOTES 
variables shown as a product ofn functions of one variable {X , ....,X, (x; 
5X) =f\(x,) -f(x,), then all these variables are independent of each 
other. 


e The maximum error in the six terms of Chebyshev representation of e * is 
0.00045 whereas the error in the six terms of Taylor series representation 
of e™ is 0.0014. 


3.7 KEY WORDS 


e Expected value: The expected value (or mean) of X is the weighted 
average of the possible values that_X can take. 


e Variance: The variance of a random variable tells us something about the 
spread of the possible values of the variable. 


e Chebyshev polynomials: The Chebyshev polynomials {T (x)} are 
orthogonal on (—1, 1) with respect to the weight function w(x) = (1 —x’)1. 


3.8 SELF-ASSESSMENT QUESTIONS AND 
EXERCISES 


Short-Answer Questions 


. Explain the techniques of assigning probability. 
. What is moment generating function? 


1 

2 

3. Explain the concept of expectation ofa random variable. 

4. Explain briefly the terms ‘expectation’ and ‘expected value’. 
5 


. Define variance. 


Long-Answer Questions 


1. Discuss the techniques of assigning probabilities. 
2. Give the mathematically representation of the variance of random variable. 


3. Define the various theories of moment generating function with syntax and 
example. 


4. Explain the probability density functions of discrete and continuous type. 
5. Drive the relation of expected value of a function of X. 


6. Briefly describe about the Chebyshev inequality. 
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4.0 INTRODUCTION 


A random variable is a phenomenon of interest in which the observed outcomes of 
an activity are entirely by chance, are absolutely unpredictable and may differ 
from response to response. By definition of randomness, each possible entity has 
the same chance of being considered. For instance, lottery drawings are considered 
to be random drawings so that each number has exactly the same chance of being 
picked up. Similarly, the value of the outcome ofa toss of a fair coin is random, 
since a head or a tail has the same chance of occurring. A random variable may be 
qualitative or quantitative in nature. The qualitative random variables yield 
categorical responses so that the responses fit into one category or another. For 
example, a response to a question such as ‘Are you currently unemployed?’ would 
fit in the category of either ‘Yes’ or ‘No’. On the other hand, quantitative random 
variables yield numerical responses. 


In this unit, you will study about the multivariate distribution, distribution of 
two random variable, conditional distribution and expectation. 


4.1 OBJECTIVES 


After going through this unit, you will be able to: 
e Analyse the multivariate distributions 
e Explain the distribution of two random variable 


e Discuss about the conditional distribution and expectation 
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4.2 MULTIVARIATE DISTRIBUTION 


Uniform or Rectangular Distribution 


Each possible value of the random variable x has the same probability in the 
uniform distribution. Ifx takes vaues x, x,....,x,, then, 


1 
p(x, b= 7 


The numbers on a die follow the uniform distribution, 
1 
P(x, 6) = 6 (Here x = 1, 2, 3, 4, 5, 6) 


Bernoulli Trials 


In a Bernoulli experiment, an Event £ either happens or does not happen (E’). 
Examples are, getting a head on tossing a coin, getting a six on rolling a die, and so 
on. 


The Bernoulli random variable is written, 
X = 1 if E occurs 
=(0 if E’ occurs 


Since there are two possible value it is a case of a discrete variable 
where, 


Probability of success = p = p(E) 
Profitability of failure = 1 -p= q = p(E^) 


We can write, 
For k=1, k) =p 
For k=0, Ak) =q 


For k= 0 or 1, Ak) = p*q'™™ 
Geometric Distribution 


Suppose the probability of success p in a series of independent trials remains 
constant. 


Suppose, the first success occurs after x failures, i.e., there are x failures 
preceding the first success. The probability of this event will be given by p(x) = 


This is the geometric distribution and can be derived from the negative 
binomial. If we put 7= 1 in the Negative Binomial distribution: 


x+r-1 


p(x) = wap g 


We get the geometric distribution, Distribution of 
Random Variables 


D(x) = “C p' q“ a pg 


P 
s ee a 
Sp) = LT PHT i NOTES 
E(x) = Mean = Z 
q 
Variance = d 
q 


Example 1: Find the expectation of the number of failures preceding the first 
success in an infinite series of independent trials with constant probability p of 
success. 


Solution: The probability of success in, 
Ist trial = p (Success at once) 
2nd trial = gp (One failure then success, and so on) 
3rd trial = q’p (Two failures then success, and so on) 
The expected number of failures preceding the success, 
E(x) =0.p+ epg + 2p’p + cacisccnce: 
= pq(1 + 24 + 3g? + wee ) 
l 1 q 


= pq =qp—= 
(l-q) P p 


Since p = 1 —q. 
Hypergeometric Distribution 


From a finite population of size N, a sample of size n is drawn without replacement. 
Let there be N, successes out of N. 
The number of failures is N,=N—N, 


The disribution of the random variable X, which is the number of successes 
obtained in the above case, is called the Hypergeometric distribution. 
EC 


pO = A (X50, 1, 2, 7) 


n 


Here x is the number of successes in the sample and n — x is the number of 
failures in the sample. 
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It can be shown that, 


N, 
Mean: E(X) = n— 


N 
l N-n(nN, nN? 
Variance : Var(X) = WLI N N 


Example 2: There are 20 lottery tickets with three prizes. Find the probability 
that out of 5 tickets purchased exactly two prizes are won. 


Solution: We have N, = 3, N,=N-—N,=17,x=2,n=5S. 


3 C WC 
P(2) = a C, ž 
3 C KO 
The probability of getting no prize p(0) = — T 5 
5 
3 C, "C, 


The probability of getting exactly 1 prize p(1)= —; T 
5 


Example 3: Examine the nature of the distibution ofr balls are drawn, one at a 
time without replacement, from a bag containing m white and n black balls. 
Solution: It is the hypergeometric distribution. It corresponds to the probability 
that x balls will be white out of 7 balls so drawn and is given by, 

"C iC 


x — 
PC ) TRG, 


Multinomial 


There are k possible outcomes of trials, viz., x,, x,, ....x, with probabilities p, p,, 
-Pp n independent trials are performed. The multinomial distibution gives the 
probability that out of these n trials, x, occurs n, times, x, occurs n, times, and so 


n! 


k 
Where, SH =n 
fal 


Characteristic Features of the Binomial Distribution 


The following are the characteristics of Binomial distribution: 
1. Itis a discrete distribution. 
2. It gives the probability of x successes and n —x failures in a specific order. 


3. The experiment consists ofn repeated trials. 


. Each trial results in a success or a failure. Distribution of 
Random Variables 


. The probability of success remains constant from trial to trial. 


. The trials are independent. 


ND Wn RA 


. The success probability p of any outcome remains constant over time. This NOTES 

condition is usually not fully satisfied in situations involving management and 
economics, for example, the probability of response from successive 
informants is not the same. However, it may be assumed that the condition 
is reasonably well satisfied in many cases and that the outcome of one trial 
does not depend on the outcome of another. This condition too, may not be 
fully satisfied in many cases. An investigator may not approach a second 
informant with the same set-up of mind as used for the first informant. 


8. The Binomial distribution depends on two parameters n and p. Each set of 
different values of n, p has a different Binomial distribution. 
9. Ifp =0.5, the distribution is symmetrical. For a symmetrical distribution, 
inn 
Prob (X = 0) = Prob (X= n) 
i.e., the probabilities of 0 or n successes in 7 trials will be the same. Similarly, 
Prob (X= 1) = Prob(X =n — 1), and so on. 
Ifp>0.5, the distribution is not symmetrical. The probabilities on the right 
are larger than those on the left. The reverse case is when p < 0.5. 
When n becomes large, the distribution becomes bell shaped. Even when n 
is not very large but p =0.5, it is fairly bell shaped. 
10. The Binomial distribution can be approximated by the normal. As n becomes 
large and p is close to 0.5, the approximation becomes better. 


Example 4: If the ratio n/N, i.e., sample size to population size is small, the result 
given by the Binomial may not be reliable. Comment. 


Solution: When the distribution is Binomial, each successive trial, being independent 
of other trials, has constant probability of success. Ifhe sampling of 7 items is 
without replacement from a population of size N, the probability of success of any 
event depends upon what happened in the previous events. In this case the Bionomial 
cannot be used unless the ratio n/N is small. Even then there is no guarantee of 
getting accurate results. 


The Binomial should be used only if the ratio ` is very small, say less that 


0.05. 

Example 5: Explain the concept ofa discrete probability distribution. 

Solution: If a random variable x assumes n discrete values x,, X,, ....... X, with 
respective probabilities p,, Pps... P,P, tP, Po + p, = 1) then, the 
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probability distribution of x. 


The frequency function or frequency distribution of x is defined by p(x) 
NOTES which for different values x, x,, ........, Ofx, gives the corresponding probabilities: 
p(x) = p, where, p(x) 2 0 Zp(x) = 1 
Example 6: For the following probability distribution, find p(x > 4) and 
P(x 24): 
x | 0 | 1 | 2 
p(x) | 0 | a | a2 


3 
al2 


4 | 5 
al4 | al4 


Solution: The solution is obtained as follows: 


a a aa 
i > = 1,0+a+—+—+—+—=l 
Since, p(x) a 
5 
5 or a 


9 
px > 4) =pe=5)= 3i 


aaa 9a 9 
Fe He E ae a ccd Se 
PER eoa a Ade AG 


Example 7: A fair coin is tossed 400 times. Find the mean number of heads and 
the corresponding standard deviation. 


l 
Solution: This is a case of Binomial distribution with p =q = ane 400 


l 
The mean number of heads is given by u = np = 400 x a 200 


And S. D. o= npg = Joxi =10 


Example 8: A manager has thought of 4 planning strategies each of which has an 
equal chance of being successful. What is the probability that at least one of his 
3 


strategies will work if he tries them in 4 situations? Here p = aq 


Solution: The probability that none of the strategies will work is given by, 
1Y (3) BÝ 
0)=*c }—]}| |=] => 
ms i G G) 
3 


4 
175 
The probability that at least one will work is given by 1- (3) = 


4) 256 
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Example 9: Suppose the proportion of people preferring a car Cis 0.5. Let X P 
denote the number of people out ofa set of 3 who prefer C. The probabilities of Random- Vartabigs 
0, 1,2, 3 of them preferring C are, 


Solution: The solution is obtained as follows: 


NOTES 
p(iX=0) = °C,(0.5)° (0.5) =; 

3 1 2 3 
p(X=1) =°C,(0.5)' (0.5) ae 
p(X=2) =*C,(0.5) (0.5) = ` 
1 
pX =3) = °C, (0.57 (0.5)° = ; 

w= E(X) = XP, = Ox 4x2 4292 42x2 43K =1.5 


© =E(X—-p) = E(X’)-p =x; p, -y 


= Pe ers ee a 1.5” 
8 8 8 8 


=0.75 
Example 10: For the Poisson distribution, write the probabilities of 0, 1, 2, .... 
successes. 
Solution: The solution is obtained as follows: 


x 
=i 


x p(x) =e 
x! 


0 p(0)=e” m°/0! 


m M 
1 p(l) =e a p(0).m 


m m 


S a. = m 
2ļe AARP 


Wee 


m m 
3 | eo" —= p(3) = p(2).— 
a P(3) = p(2) 3 


, and so on. 


Total ofall probabilities Yp(x) = 1 
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Example 11 What are the raw moments of Poisson distribution? 
Solution: First raw moment p’, =m 
Second raw moment p’, =m? + m 
Third raw moment u’, = m° + 3m? +m 
Example 12: For a Poisson distribution p(0) = p(1), find p(x > 0). 
Solution: We have e” m°/0! =e” m°/1! so that m= 1 
<. pO) =e'!=1/2.718 = 0.37 
px>0) =1-p(0)= 0.63 
Example 13: It is claimed that the TV branded M is demanded by 70% of 


customers. If Xis the number of TVs demanded, find the probability distribution 
of X when there are four customers. 


Solution: If all the four demand M then p(4) = 0.74 = 0.2401. The probability 
distribution is: 

X 4 3 2 1 0 

p 0.2401 0.4116 0.2646 0.0756 0.0081 
These values may be plotted. 


Note: Poisson Approximation to the Binomial is given by when the number of trials n is large 
and the probability p ofa success is small, the Binomial distribution can be approximated by 
the Poisson. This approximation is useful is practice. 


If p = 0.002, n = 1000, m = np = 2 
The probability of 2 successes in 1000 trials is, 
p(X = 2) = "C, p*q"™ =C, (0.002)? (0.998) 


Similarly, p(X = 3) = '°C,(0.002)’ (0.998) , etc. 


These terms are difficult to calculate. If we employ the Poisson, we have a 
much easier task before us. 


m= np = 100 x 0.002 = 2 


- 2 -242 
e"m e 2 


p(X= 2) = a = 0,1353x 20.2706 
yl 2! 
e"m" e? _ 9.1804 
a) eer 5° 


Example 14: One in every 100 items made ona machine is defective. Out of 25 
items picked, find the probability of 1 item being defective. 


Solution: p = 0.01, q = 0.99, n = 25, np = 0.25 
Binomial : p(1) = *C,(0.1)' (0.99) = 0.1964 


-25 (0.25)! 
Poisson : p(1) = oa = 0.1947 


Exponential Distribution Distribution of 
Random Variables 


In probability theory and statistics, the exponential distributions, also known as 
negative exponential distributions, are a set of continuous probability distributions. 
They describe the times between events in a Poisson process, i.e., a process in 
which events occur continuously and independently at a constant average rate. 


Probability Density Function 


The Probability Density Function (PDF) of an exponential distribution is defined 
as, 


he x20, 
0, x<0. 


stan] 


Here A > 0 is the parameter of the distribution and is frequently termed as the rate 
parameter, à. The distribution is based on the interval (0, °). When a random 
variable X has exponential distribution, then it is written as X ~ Exp(A). 


Cumulative Distribution Function 


The cumulative distribution function is defined as, 


F(x; a=] 


(ae. x20, 


0, x<0. 


Alternative Parameterization 


The Probability Density Function (PDF) ofan exponential distribution can also be 
defined using alternative parameterization as, 


1 
e, > 0, 


f(x; B) 7 0, x<0. 

Here B > 0 is a scale parameter of the distribution. It is the reciprocal of the rate 
parameter, i. In this specific notation, P is considered as a survival parameter if 
a random variable X which is the duration of time that manages system to survive 
and X ~ Exponential() then ELX] = B. Thus, the expected duration of survival of 
the system is B units of time. The parameterization involves the ‘rate parameter’ 
that arises in the context of events arriving at a rate à, when the time between 
events has a mean of B = à~. 


Occurrence and Uses 


The exponential distribution occurs automatically while the lengths of the inter- 
arrival times are described in a homogeneous Poisson process. It can be analysed 


NOTES 


Self-Instructional 
Material 


103 


Distribution of 
Random Variables 


104 


NOTES 


Self-Instructional 
Material 


as a continuous counterpart of the geometric distribution for describing the number 
of Bernoulli trials necessary for a discrete process to change state. Thus, the 
exponential distribution describes the time for a continuous process to change 
state. 


Exponential variables can be used to model situations where specific events 
occur with a constant probability per unit length. In queuing theory, the service 
times of agents in a system are frequently modeled as exponentially distributed 
variables, for example the time taken by a bank teller to serve a customer. The 
length ofa process can be considered as a sequence of several independent events 
and is modeled using a variable following the Erlang distribution, which is the 
distribution of the sum of several independent exponentially distributed variables. 


The exponential distribution is also used in reliability theory and reliability 
engineering. Because this distribution has the memoryless property, hence it is 
quite compatible to model the constant hazard rate portion of the bathtub curve in 
reliability theory. In physics, when a gas is observed at a fixed temperature and 
pressure in a uniform gravitational field, then the altitudes of the various molecules 
also adhere to an approximate exponential distribution. 


Properties of Exponential Distribution 


Mean: The mean or expected value ofan exponentially distributed random variable 
X with rate parameter À is given as, 


L 


E[X]= 
[X] 7 
Variance: The variance of X is given as, 
l 
Var[ X] = re 


Median: The median of Xis given as, 


mf X1= “= < ELX], 


Here, /n refers to the natural logarithm. Hence, the absolute difference between 
the mean and median is given as shown below in accordance with the median- 


mean inequality. 


| ELX]—m[X] = 


ne < l = Standard deviation 
À À 


Memorylessness: It is an important property of the exponential distribution. 
This explains that when a random variable Tis exponentially distributed, then its 
conditional probability follows the following notation: 


Pri? >s+t|T>s]=Pr{T >t] forall s,t > 0. 


Let us use the above equation to explain memorylessness. For example, the 
conditional probability that we need to wait more than another 10 seconds before 
the first arrival specified that the first arrival has not still happened even after 30 
seconds, then it is equal to the initial probability that we need to wait more than 10 
seconds for the first arrival. Thus, if we have waited for 30 seconds and the first 
arrival did not happen (T > 30), then there is a probability that we need to wait for 
another 10 seconds for the first arrival (T> 30 + 10). This is similar to the initial 
probability that we need to wait more than 10 seconds for the first arrival (T> 10). 
This does not mean that the events T > 40 and T > 30 are independent events. 
The exponential distributions and the geometric distributions are the only 
memoryless probability distributions. The exponential distribution also has a 
constant hazard function. 


Quartiles: The quartile function or inverse cumulative distribution function for 
Exponential (A) is given as, 


Fipa ZE, 


For 0 <p < 1. Therefore the quartiles are as follows: 
First Quartile - In (4/3VÀ 

Median - In (2)/A 

Third Quartile - In (4)/A 


Kullback—Leibler Divergence: The directed Kullback—Leibler divergence 
between Exp(A,,) for ‘True’ distribution and Exp(A) for ‘Approximating’ distribution 
is given as, 


A[A, || A] = log(A, ) — log(A) + + -1. 


0 


Maximum Entropy Distribution: The Exponential distribution with A= 1/u has 
the largest entropy with all continuous probability distributions having support (0, 
co) and mean i. 


Distribution of the Minimum of Exponential Random Variables 


Let X, ....X, be the independent exponentially distributed random variables with 
rate parameters À, ...,A,. Then, min {X ...,X, } is also exponentially distributed 
with parameter 1 =A, +... + À, 


This can be defined using the complementary cumulative distribution function as 
shown below: 


Primin{X,,...,X,,} > x)= Pr[ X, > xand...and X, > x) 


= T1prcx, >x)= [lexecx.,) -exp -135x ) 
i=l i=l i=l 
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The index of the variable which achieves the minimum is distributed according to 
the law, 
r k 


Pr(X, = min{X,,...,X, D) =———*_. 
( k { 1 }) woos a 


Remember that min {X ...,X, } is not exponentially distributed. 


Check Your Progress 
1. What are the Bernoulli trials? 
2. Define the term hypergeometric distribution. 


3. Give some characteristics features of the binomial distribution. 


4. What is exponential distribution? 


4.3 DISTRIBUTION OF TWO RANDOM 
VARIABLE 


Mean of Random Variable or The Expected Value of 
Random Variable 


Mean of random variable is the sum of the values of the random variable weighted 
by the probability that the random variable will take on the value. In other words, 
it is the sum of the product of the different values of the random variable and their 
respective probabilities. Symbolically, we write the mean ofa random variable, 
say X,as y . The Expected value of the random variable is the average value 
that would occur if we have to average an infinite number of outcomes of the 
random variable. In other words, it is the average value of the random variable in 
the long run. The expected value of a random variable is calculated by weighting 
each value of a random variable by its probability and summing over all values. 
The symbol for the expected value of a random variable is E (X). Mathematically, 
we can write the mean and the expected value ofa random variable, X, as follows: 


X= 2%) 2-4) 
And, 
E(X) =X (4 er) 


Thus, the mean and expected value of a random variable are conceptually 
and numerically the same but usually denoted by different symbols and as such 
the two symbols, viz., X and E (X) are completely interchangeable. We can, 
therefore, express the two as follows: 


n me Distribution of 
E(X)=F (X;}pr.(X;)=X Random Variables 
i=l 


Where X, is the ith value X can take. 


Sum of Random Variables NOTES 


Ifwe are given the means or the expected values of different random variables, 
say X, Y, and Z to obtain the mean of the random variable (X + Y + Z), then it can 
be obtained as under: 


E(X+Y+Z)=E(X)+E(Y)+E(Z)=X+Y+Z 


Similarly, the expectation of a constant time a random variable is the 
constant time the expectation of the random variable. Symbolically, we can 
write this as under: 


E(cX)=cE(X)=cX 


Where cX is the constant time random variable. 


4.4 CONDITIONAL DISTRIBUTION AND 
EXPECTATION 


Expectation (Conditional) 


The expectation ofa random variable X with probability density function (PDF) 
p(x) is theoretically defined as: 


EL X]= J xp(x)dx 


If we consider two random variables X and Y (not necessarily independent), then 
their combined behaviour is described by their joint probability density function 
p(x, y) and is defined as: 


pins X<x+dx,yv<¥<y+tdy} = p(x, y).dx.dy 


The marginal probability density of Xis defined as, 


Px(x)=J p(x, y)dy 


For any fixed value y of Y, the distribution of X is the conditional distribution of X, 
where Y=y, and it is denoted by p(x , y). 


Expectation (Iterated) 


The expectation of the random variable is expressed as: 


ELX] = E[E[X| YI] 
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hanna A i This expression is known as the ‘Theorem of Iterated Expectation’ or 
meg aa ee “Theorem of Double Expectation’. Symbolically, it can be expressed as: 


(i) For the discrete case: 
NOTES EIX] =£, ELX| Y =y].P{Y =y} 
(ii) For the continuous case: 
ELX]= [Z ELX = y].£().dy 
Expectation: Continuous Variables 


Ifx is a continuous random variable we define that, 


E(x) = i x P(x)dx =u 


= 


The expectation of a function A(x) is, 
Eh(x) = f h(x) P(x)dx 
The 7th moment about the mean is, 
E(x- uy = i (x-y) P(x) dx 


Example 15: A newspaper earns Rs. 100 a day if there is suspense in the news. 
He loses Rs. 10 a day if it is an eventless newspaper. What is the expectation of 
his earnings if the probability of suspense news is 0.4? 
Solution: E(x) = px, + PX, 

=0.4 x 100 —0.6 x 10 

=40-6=34 
Example 16: A player tossing three coins, earns Rs. 10 for 3 heads, Rs. 6 for 2 
heads and Re. 1 for 1 head. He loses Rs. 25, if3 tails appear. Find his expectation. 


1 1 1 1 
ion: =-~. Z =-= psa 
Solution: D(HHA) 22 2 8 pı say 3 heads 
Hb Aa E hadi 
p(HHT) = ere art a 2 eads, | tai 
HTT) =°G -4.4.1222 psay 1 head,2 tail 
P(HTT) = Ia ge ey ead, 2 tails 
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TT aa Lee sa 3 tail 
p( n=3 22 8 p,say ails 
P 


E(x) = PX, + pao) + PX DX, NOTES 


= ? _Rs 1.125 
8 


Example 17: A and B roll a die. Whoever gets a 6, first wins Rs. 550. Find their 
individual expectations if A makes a start. What will be the answer if B makes a 
start? 


1 
Solution: A may not get 6 in Ist trial, p, = r 
A may not get in Ist, B may not get in 2nd and A may get in 3rd, 


AAF i 
PSr a o 6 and so on. 


reer ere A 
s winning c ance =z 6 6 a oems 


Where p(x) is the density function x, 


T Ae: 
A inning chance = 1- — 
sw g chance nu 


6 
A wins Rs. 550 with probability p = T 


5 
A gets nothing if he loses with probability q = Ti 
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Expectation of A=p.x+q.0 


= S ssi 40306 
11 11 


5 
Similarly B wins with p = li 


A ; g 6 
B gets nothing ifhe loses with q = a 


5 6 
Expectation of B= T x 550 + T x0 = 200 


If B starts first his expectation would be 300 and A’s would be 250. 


Example 18: Calculate the standard deviation (S.D.) when x takes the values 0, 
1,2 and 9 with probability, 0.4, 0—0.1. 


Solution: x takes the values 0, 1, 2, 9 with probability 0.4, 0.2, 0.3, 0.1. 
w= E(x) =Lx.p)=0* 04+ 17x 0.2+3 x 0.3+9 x 0.1 = 2.0 
E(x’) = Xx?p,=0? x 0.44 17x 0.24+3 x 0.3 +9? x 0.1=11.0 
Vix) = EQ’)-w=11-2=9 


S.D.(x) = Jo =3 


Example 19: The purchase of some shares can give a profit of Rs. 400 with 
probability 1/100 and Rs. 300 with probability 1/20. Comment on a fair price of 
the share. 


Solution: Expected value E(x) = Xx.p.= 400 x 5 +300x > =19 


Example 20: Find the variance and standard deviation (S.D.) of the following 
probability distribution: 


aha Pee Bhs 
p, | 0.1] 0.3 | 0.2 | 0.4 


Solution: E(x)=2px,=0.1 x 1 +0.3 x 2 + 0.2 x3 +0.4x4=2.9 
Variance (x) = V(x) = E(x?) — x’ 


Ep, x? -X° =0.1x1? +0.3x2? +0.2x3? +0.4x 4 — 2.9’ 
=0.1 + 1.2 + 1.8 + 6.4- 8.41 = 1.0.9 


S.D.(x) = AV (x) = 41.091.044 


Example 21: Prove that the variance of a constant is zero. Per aaies 
Solution: If is a constant, it has no variability. 
V(k) =0 

If the constant kis attached to a variable x then, NOTES 

Vikx) = V(x) 

V(2x) = 4V(x) 

V(2 + 3x) = Vx) =9 V(x) 

Example 22: A box contains 2” tickets of which "C, tickets bear the number i(i = 


1, 2,3, ...., n). Ifa set ofm tickets is drawn, find the expected value of the sum of 
their numbers. 


Solution: To find E(A), where A =x, +x, + ...+ x „ Here x, can take values 0, 1, 
2,...., n with probabilities, 


ig or ig a gee 
Be tom Oe 
mE eG RG "C 
E(x.) = px, =—*.0+ —+.14+—.24+....4 —.n 
(%) = 2m a F T 
1 n n n 
= zal Cred Cyan C] 
= l +"! C, +" 'C, +....+" "C, ] (By takingn common) 


> (esha = [Expand (1 + 1)"-! and check] 


E(A) = De) = 


Check Your Progress 


5. Explain the mean of random variable. 


6. What is conditional expectation? 


4.5 ANSWERS TO CHECK YOUR PROGRESS 
QUESTIONS 


1. Ina Bernoulli experiment, an Event £ either happens or does not happen 
(E^). Examples are, getting a head on tossing a coin, getting a six on rolling 
a die, and so on. 
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Distribution of 2. The disribution of the random variable X, which is the number of successes 
Random Variables A : i Dh Ga Era s, 
obtained in the above case, is called the Hypergeometric distribution. 
N Cc C 


NOTES pa) = We (X= 0, 1, 2, ....5 0) 


n 


3. The following are the characteristics of Binomial distribution: 
(a) Itis a discrete distribution. 


(b) It gives the probability of x successes and n — x failures in a specific 
order. 


(c) The experiment consists ofn repeated trials. 
(d) Each trial results in a success or a failure. 


4. Inprobability theory and statistics, the exponential distributions, also known 
as negative exponential distributions, are a set of continuous probability 
distributions. 


5. Mean of random variable is the sum of the values of the random variable 
weighted by the probability that the random variable will take on the value. 


6. The expectation ofa random variable X with probability density function 
(PDF) p(x) is theoretically defined as: 


EL X] = | xp(x)dx 


If we consider two random variables Xand Y (not necessarily independent), 
then their combined behaviour is described by their joint probability density 
function p(x, y) and is defined as: 


pixsX<x+dx, y<V¥<ytdy} = p(x, y).dx.dy 
The marginal probability density of X is defined as, 
Px(x) =] p(x, y)dy 


For any fixed value y of Y, the distribution of Xis the conditional distribution 
of X, where Y=y, and it is denoted by p(x, y). 


4.6 SUMMARY 


e Each possible value of the random variable x has the same probability in the 
uniform distribution. Ifx takes vaues x, x,....,x,, then, 


1 
ple, ) => 


e The multinomial distibution gives the probability that out of these n trials, x, 
occurs n, times, x, occurs n, times, and so on. This is given by the following 
n! 


Self-Instructional equation: a ei Pi Po 
j n:n 
112 Material RLORTIT Y 


e The Binomial distribution depends on two parameters n and p. Each set of Distribution of 
< : ; eas K Random Variables 
different values ofn, p has a different Binomial distribution. 


e The Binomial distribution can be approximated by the normal. As n becomes 
large and p is close to 0.5, the approximation becomes better. NOTES 


e The Probability Density Function (PDF) of an exponential distribution can 
also be defined using alternative parameterization as, 


1 
e, ge 0, 


fsB)= p 0, x<0. 


e The Expected value of the random variable is the average value that would 
occur if we have to average an infinite number of outcomes of the random 
variable. 


4.7 KEY WORDS 


e Cumulative distribution function: The cumulative distribution function is 
defined as, 


F(x; a=] 


e Quartiles: The quartile function or inverse cumulative distribution function 
for Exponential (A) is given as, 


l-e*, x20, 


0, x<0. 


Fipa ZEP, 


e Maximum entropy distribution: The Exponential distribution with A= 1/ 
u has the largest entropy with all continuous probability distributions having 
support (0, cc) and mean u. 


4.8 SELF-ASSESSMENT QUESTIONS AND 
EXERCISES 


Short-Answer Questions 


1. Explain about the uniform or rectangular distribution. 
2. What is geometric distribution? 

3. What is meant by multinomial? 

4. Explain the sum of random variables. 

5. What is iterated expectation? 
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Long-Answer Questions 


1. Briefly describe the probability density function. 
2. Discuss the properties of exponential distribution. 
3. Describe the expected value of random variables. 


4. Explain the continuous variables expectation. 
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5.0 INTRODUCTION 


Correlation analysis looks at the indirect relationships in sample survey data and 
establishes the variables which are most closely associated with a given action or 
mind-set. It is the process of finding how accurately the line fits using the 
observations. Correlation analysis can be referred as the statistical tool used to 
describe the degree to which one variable is related to another. The relationship, if 
any, is usually assumed to be a linear one. In fact, the word correlation refers to 
the relationship or interdependence between two variables. There are various 
phenomena which have relation to each other. The theory by means of which 
quantitative connections between two sets of phenomena are determined is called 
the ‘Theory of Correlation’. On the basis of the theory of correlation you can 
study the comparative changes occurring in two related phenomena and their cause- 
effect relation can also be examined. Thus, correlation is concerned with relationship 
between two related and quantifiable variables and can be positive or negative. 


In this unit you will study about the correlation, types of correlation and 
properties of correlation coefficient, methods of studying correlation, scatter diagram, 


Karl Pearson’s coefficient, rank coefficient and coefficient of determination. 
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5.1 OBJECTIVES 


After going through this unit, you will be able to: 
e Define correlation 
e Explain the different types of correlation 
e Understand the function of correlation coefficient 


e Understand the different methods of studying correlation 


5.2 CORRELATION 


Correlation analysis is the statistical tool generally used to describe the degree to 
which one variable is related to another. The relationship, if any, is usually assumed 
to bea linear one. This analysis is used quite frequently in conjunction with regression 
analysis to measure how well the regression line explains the variations of the 
dependent variable. In fact, the word correlation refers to the relationship or 
interdependence between two variables. There are various phenomena which have 
relation to each other. For instance, when demand of a certain commodity 
increases, then its price goes up and when its demand decreases then its price 
comes down. Similarly, with age the height of the children increases; with height 
the weight of the children increases, with money supply the general level of prices 
go up. Such sort of relationship can as well be noticed for several other phenomena. 
The theory by means of which quantitative connections between two sets of 
phenomena are determined is called the Theory of Correlation. 


On the basis of the theory of correlation one can study the comparative 
changes occurring in two related phenomena and their cause-effect relation can 
be examined. It should, however, be borne in mind that relationship like “black cat 
causes bad luck’, ‘filled-up pitchers result in good fortune’ and similar other beliefs 
of the people cannot be explained by the theory of correlation since they are all 
imaginary and are incapable of being justified mathematically. Thus, correlation is 
concerned with the relationship between two related and quantifiable variables. If 
two quantities vary in sympathy so that a movement (an increase or decrease) in 
the one tends to be accompanied by a movement in the same or opposite direction 
in the other and the greater the change in the one, the greater is the change in the 
other, the quantities are said to be correlated. This type of relationship is known as 
correlation or what is sometimes called, in statistics, co-variation. 


For correlation it is essential that the two phenomena, should have cause- 
effect relationship. If such relationship does not exist then one should not talk of 
correlation. For example, if the height of the students as well as the height of the 
trees increases, then one should not call it a case of correlation because the two 
phenomena, viz., the height of students and the height of trees are not causally 
related. But the relationship between the price of a commodity and its demand, 
the price of acommodity and its supply, the rate of interest and savings, etc., are 


examples of correlation since in all such cases the change in one phenomenon is 
explained by a change in the other phenomenon. 


Check Your Progress 
1. What are the different types of correlations? 
2. Explain the meaning of correlation analysis. 


3. What is the scatter diagram method? 


4. What is the least-squares method? 


5.3 TYPES OF CORRELATION 


It is appropriate here to mention that correlation in case of phenomena pertaining 
to natural sciences can be reduced to absolute mathematical terms, for example, 
heat always increases with light. But in phenomena pertaining to social sciences, it 
is often difficult to establish any absolute relationship between two phenomena. 
Hence, in social sciences we must take the fact of correlation being established if 
in a large number of cases, two variables always tend to move in the same or the 
opposite direction. 


Correlation can either be positive or it can be negative. Whether 
correlation is positive or negative would depend upon the direction in which the 
variables are moving. If both variables are changing in the same direction, then 
correlation is said to be positive but when the variations in the two variables take 
place in opposite direction, the correlation is termed as negative. This can be 
explained as follows: 


~ Changes in Independent Changes in Dependent Nature of 
Variable Variable Correlation 
Increase (+)T Increase (+)T Positive (+) 
Decrease (—)L Decrease (—)L Positive (+) 
Increase (+)T Decrease (—)L Negative (—) 
Decrease (—)L Increase (+)T Negative (—) 


Correlation can either be linear or it can be non-linear. Non-linear 
correlation is also known as curvilinear correlation. The distinction is based upon 
the constancy of the ratio of change between the variables. When the amount of 
change in one variable tends to bear a constant ratio to the amount of change in the 
other variable then the correlation is said to be linear. In such a case, ifthe values 
of the variables are plotted on a graph paper, then a straight line is obtained. This 
is why the correlation is known as linear correlation. But when the amount of 
change in one variable does not bear a constant ratio to the amount of change in 
the other variable, i.e., the ratio happens to be variable instead of constant, then 
the correlation is said to be non-linear or curvilinear. In such a situation we shall 
obtain a curve if the values of the variables are plotted on a graph paper. 
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Correlation can either be simple correlation or it can be partial 
correlation or it can be multiple correlation. The study of correlation for two 
variables (of which one is independent and the other is dependent) involves 
application of simple correlation. When more than two variables are involved in a 
study relating to correlation then it can either be as of multiple correlation or of 
partial correlation. Multiple correlation studies the relationship between a dependent 
variable and two or more independent variables. In partial correlation, we measure 
the correlation between a dependent variable and one particular independent variable 
assuming that all other independent variables remain constant. 


Statisticians have developed two measures for describing the correlation 
between two variables viz., the coefficient of determination and the coefficient of 
correlation. 


5.4 PROPERTIES OF CORRELATION 
COEFFICIENT 


The coefficient of correlation symbolically denoted by ‘r’ is an important measure to 
describe how well one variable is explained by another. It measures the degree of 
relationship between the two causally-related variables. The value of this coefficient 
can never be more than + 1 or less than—1. Thus, + 1 and —1 are the limits of this 
coefficient. Fora unit change in independent variable, if there happens to be a constant 
change in the dependent variable in the same direction then the value of the coefficient 
will be+ 1 indicative of the perfect positive correlation; but if such a change occurs 
in the opposite direction, the value of the coefficient will be—1, indicating perfect 
negative correlation. In practical life the possibility of obtaining either a perfect positive 
or perfect negative correlation is very remote, particularly in respect of phenomena 
concerning social sciences. If the coefficient of correlation has a zero value then it 
means that there exists no correlation between the variables under study. 


There are several methods of finding the coefficient of correlation but the 
following ones are considered important:' 
(i) Coefficient of correlation by the method of least squares 
(ii) Coefficient of correlation through product moment method or Karl 
Pearson’s coefficient of correlation 
(iii) Coefficient of correlation using simple regression coefficients 
Whichever of these three methods, we adopt we get the same value ofr: 
Now, we explain in brief each one of these three methods of finding 7’. 


5.5 METHODS OF STUDYING CORRELATION 


5.5.1 Scatter Diagram 


Least squares method of fitting a line (the line of best fit or the regression line) 
through the scatter diagram is a method which minimizes the sum of the squared 


vertical deviations from the fitted line. In other words, the line to be fitted will pass 
through the points of the scatter diagram in such a fashion that the sum of the 
squares of the vertical deviations of these points from the line will be a minimum. 
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Fig 5.1 Scatter Diagram 


The meaning of the least squares criterion can be better understood more 
easily through reference to the following Figure 5.2 where the Scatter diagram has 
been reproduced along with a line which represents the least squares fit to the 
data. 
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Fig 5.2 Scatter Diagram, Regression Line and 
Short Vertical Lines Representing ‘e’ 


In Figure 5.2, the vertical deviations of the individual points from the line are 
shown as the short vertical lines joining the points to the least squares line. These 
deviations are denoted by the symbol ‘e’. The value ‘e’ varies from one point to 
another. In some cases it is positive, in others it is negative. If the line drawn 
happens to be the least squares line then the values of $` e; is the least possible. It 
is because of this feature the method is known as Least Squares Method. 


Why we insist on minimizing the sum of squared deviations is a question that 
needs explanation. If we denote the deviations from the actual value Y to the 
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estimated value as (Y — Y) or e, itis logical that we want the Z(Y —Y) or Ye, to 
i=l 


be as small as possible. However, mere examining £(Y —Y) or Xe, is inappropriate 
i=l 

since any e,can be positive or negative and large positive values and large negative 

values would cancel one another. 


But large values of e regardless of their sign, indicate a poor prediction. Even 


if we ignore the signs while working out 2,1€; |, the difficulties may continue to be 
i=l 


there. Hence, the standard procedure is to eliminate the effect of signs by squaring 
each observation. Squaring each term accomplishes two purposes viz., (i) It magnifies 
(or penalizes) the larger errors, and (ii) It cancels the effect of the positive and 
negative values (since a negative error squared becomes positive). The choice of 
minimizing the squared sum of errors rather than the sum of the absolute values 
implies that we would make many small errors rather than a few large errors. Hence, 
in obtaining the regression line we follow the approach that the sum of the squared 
deviations be minimum and on this basis work out the values ofits constants viz., ‘a’ 
and ‘b’ or what is known as the intercept and the slope of the line. This is done with 
the help of the following two normal equations:* 
LY =na+ bX 


XXY =aXX + bxX? 


In these two equations, ‘a’ and ‘b’ are unknowns and all other values viz., 
XX, XY, X X?, XXY are the sum of the products and the cross products to be 
calculated from the sample data and ‘n’ means the number of observations in the 
sample. Hence, one can solve these two equations for finding the unknown values. 
Once these values are found, the regression line is said to have been defined for 
the given problem. Statisticians have also derived a short cut method through 
which these two equations can be rewritten so that the values of ‘a’ and ‘b’ can be 
directly obtained as follows: 


y nŁ XY -2 X.2LY 
noA (A 


gat p 


n n 


5.5.2 Karl Pearson’s Coefficient 


Karl Pearson’s method is the most widely used method for measuring the 
relationship between two variables. This coefficient is based on the following 
assumptions: 


(7) There is a linear relationship between the two variables which means that a 
straight line would be obtained if the observed data is plotted on a graph. 


(ii) The two variables are causally related which means that one of the variables 
is independent and the other one is dependent. 


(iii) A large number of independent causes operates in both the variables so as 
to produce a normal distribution. 


According to Karl Pearson, ‘7’ can be worked out as follows: 


Lxy 

r © noo, 

Here, x =(x-x) 
y =0-y) 


O, = Standard deviation of 


2 
X series and is equal to paa 
n 


On Standard deviation of 


2 
Y series and is equal to , 2a 
n 


n = Number of pairs of X and Y observed 


A short cut formula known as the Product Moment Formula can be derived 
from the earlier formula: 


These formulae are based on obtaining the true means (viz., x and y ) first 
and then performing all other calculations. 


5.5.3 Rank Coefficient 


If observations on two variables are given in the form of ranks and not as numerical 
values, it is possible to compute what is known as rank correlation between the 
two series. 

The rank correlation, written p, is a descriptive index of agreement between 
ranks over individuals. It is the same as the ordinary coefficient of correlation 
computed on ranks, but its formula is simpler. 
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62D? 


L 


z n(n? —1) 


Here, n is the number of observations and D, the positive difference between 
ranks associated with the individuals 7. 


Like r, the rank correlation lies between —1 and +1. 
Example 1: The ranks given by two judges to 10 individuals are as follows: 


Rank given by 


Individual Judge I Judge IT D D 
x y = x-y 
1 1 7 6 36 
2 2 5 3 9 
3 7 8 1 1 
4 9 10 1 1 
5 8 9 1 1 
6 6 4 2 4 
7 4 1 3 9 
8 3 6 3 9 
9 10 3 7 49 
10 5 2 3 9 
ED? = 128 


Solution: The rank correlation is given by, 
62D? _ ,_ 6x128 

n—-n 10° -10 
The value of p = 0.224 shows that the agreement between the judges is not 
high. 


Example 2: In the previous case, compute r and compare. 


p=1 = 1-0.776 = 0.224 


Solution: The simple coefficient of correlation r for the previous data is calculated 
as follows: 


x y x? y xy 
1 i l 49 7 
2 5 4 25 10 
7 8 49 64 56 
9 10 81 100 90 
8 9 64 81 72 
6 4 36 16 24 
4 1 16 1 4 
3 6 9 36 18 
10 3 100 9 30 
5 2 25 4 10 


y 


55 Ly = 55 Ex? = 385 Ey = 385 <xy = 321 


Correlation 
55 55 
pee OR an 18.5 18.5 
x 18.5 
2 2 /82.5x 82.5 82.5 
Jss-10(3) Jss-10(3) 
10 10 NOTES 
= 0.224 


This shows that the Spearman p for any two sets of ranks is the same as 
the Pearson r for the set of ranks. But it is much easier to compute p. 

Often, the ranks are not given. Instead, the numerical values of observations 
are given. In such a case, we must attach the ranks to these values to calculate 


p. 


Example 3: 
Marks in Marks in Rank in Rank in 
Maths Stats Maths Stats D DP 

45 60 4 2 2 4 
47 61 3 1 2 4 
60 58 1 3 2 4 
38 48 5 4 1 1 
50 46 2 5 3 9 

ED? = 22 

gale 2 aie E aai 
n` -n 125-5 


Solution: This shows a negative, though small, correlation between the ranks. 
If two or more observations have the same value, their ranks are equal and 
obtained by calculating the means of the various ranks. 


If in this data, marks in maths, are 45 for each of the first two students, the 


rank of each would be = = 3.5. Similarly, if the marks of each of the last 


two students in statistics are 48, their ranks would be == = 4,5 
The problem takes the following shape: 
Rank 
Marks in Marks in x y D 

D? Maths Stats 
45 60 3.5 2 1.5 2.25 
45 61 3.5 1 2.5 6.25 
60 58 1 3 2 4.00 
38 48 5 4.5 1.5 2.25 
50 48 2 4.5 2.5 6.25 
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Correlation 6ED 7 6x21 
p= 1-2 = 1-2 = 0.05 
n -=n 120 
An elaborate formula which can be used in case of equal ranks is 
NOTES 6 1 
p =1-——_| ED’ +—E(m -m) |, 
n -n 12 


Here, Za -m) is to be added to XD? for each group of equal ranks, m 


being the number of equal ranks each time. 
For the given data, we have 
For series x, the number of equal ranks m = 2. 
For series y, also, m = 2; so that, 


6 1 1 
=i He j +2 
P Sl 2? atal | 


6x22 
= =-0.1 
120 
Example 4: Show by means of diagrams various cases of scatter expressing 
correlation between x, y. 
Solution: 
(a) b 7 
a Negative slope j Q Positive slope 

Inverse linear relationship Direct linear relationship 

High scatter r low, negative High scatter r low, positive 
(c) a ¥ 

O . ; X 
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O x Q Direct A 


Inverse a ; ‘ 
curvilinear relationship curvilinear relationship 


Perfect relationship 
But, r = 0 because of 
non-linear relation 


Correlation analysis helps us in determining the degree to which two or 
more variables are related to each other. 


When there are only two variables we can determine the degree to which 
one variable is linearly related to the other. Regression analysis helps in determining 
the pattern of relationship between one or more independent variables and a 
dependent variable. This is done by an equation estimated with the help of data. 


5.6 COEFFICIENT OF DETERMINATION 


Coefficient of determination (7°) which is the square of the coefficient of correlation 
(r) is amore precise measure of the strength of the relationship between the two 
variables and lends itself to more precise interpretation because it can be presented 
as a proportion or as a percentage. 


The coefficient of determination (7°) can be defined as the proportion of the 
variation in the dependent variable Y, that is explained by the variation in independent 
variable X, in the regression model. In other words: 


Explained variation 


r= 
Total variation 


Y, -Y7 
EY -Y) 


Correlation 


NOTES 


Self-Instructional 
Material 


125 


Correlation 2 
ney ep = > 
7 ag 


NOTES 
Example 5: The heights of fathers and their sons are given. Calculate the coefficient 


of correlation r and the coefficient of determination (7°). Also, given that 


b, = 26.25, 
b = 0.625 
Father (X) Son (Y) 
63 66 
65 68 
66 65 
67 67 
67 69 
68 70 
Solution: Now, 
2 
iy ate pay6 cuca 
2 
fs 2 
2 (ZY) 
oy- 
X Y X2 XY y2 
63 66 3969 4158 4356 
65 68 4225 4420 4624 
66 65 4356 4290 4225 
67 67 4489 4489 4489 
67 69 4489 4623 4761 
68 70 4624 4760 4900 
LY = 396 LY = 405 EX = 26152 EXY = 26740 LY? = 27355 
Hence, 
2 
26.25(405) + 0.625(26740) — Oey. 
yee 
C= 2 
27355 — ae 
6 
10631.25+ 16712.5 —27337.5 
27355 —27377.5 
= o22 = 0.357 
17.5 
and r= Vp? = 0.357 =0.597 
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While the value ofr = 0.597 is more of an abstract figure, the value of 7? = 
0.357 tells us that 35.7% of the variation in Y is explained by the variation in X. 
This indicates a weak relationship since the value of 7 = 0, means no relationship 
at all and the value ofr = 1 or 100% means perfect relationship. In general, for a 
high degree of correlation which leads to better estimates and prediction, the 
coefficient of determination 7” must have a high value. 


Check Your Progress 


5. What do you mean by coefficient of correlation? 


6. State the assumptions for finding the coefficient of correlation by Karl 
Pearson’s method. 


7. Define coefficient of determination, r”. 


5.7 ANSWERS TO CHECK YOUR PROGRESS 
QUESTIONS 


1. There are several types of correlations. They are: 
(a) Positive or negative correlations 
(b) Linear or non-linear correlations 
(c) Simple, partial or multiple correlations 


2. Correlation analysis is the statistical tool that is generally used to describe 
the degree to which one variable is related to another. The relationship, if 
any, is usually assumed to be a linear one. This analysis is used quite frequently 
in conjunction with regression analysis to measure how well the regression 
line explains the variations of the dependent variable. In fact, the word 
correlation refers to the relationship or interdependence between two 
variables. There are various phenomena, which are related to each other. 
For instance, when demand of a certain commodity increases, then its price 
goes up and when its demand decreases then its price comes down. 


3. Scatter diagram is the method to calculate the constants in regression models 
that makes use of scatter diagram or dot diagram. A scatter diagram is a 
diagram that represents two series with the known variables, i.e., independent 
variable plotted on the X-axis and the variable to be estimated, i.e., dependent 
variable to be plotted on the Y-axis. 


4. The least squares method is a method to calculate the constants in regression 
models for fitting a line through the scatter diagram that minimizes the sum 
of the squared vertical deviations from the fitted line. In other words, the 
line to be fitted will pass through the points of the scatter diagram in such a 
fashion that the sum of the squares of the vertical deviations of these points 
from the line will be a minimum. 
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Correlation 5. The coefficient of correlation, which is symbolically denoted by r, is another 
important measure to describe how well one variable explains another. It 
measures the degree of relationship between two casually related variables. 
The value of this coefficient can never be more than +1 or—1. Thus, +1 and 


NOTES —] are the limits of this coefficient. 


6. Karl Pearson’s method is most widely used method of measuring the 
relationship between two variables. This coefficient is based on the following 
assumptions: 

(i) There isa linear relationship between the two variables which means 
that a straight line would be obtained if the observed data is plotted on 
a graph. 

(ii) The two variables are casually related which means that one of the 
variables is independent and the other is dependent. 


(iii) A large number of independent causes are operating in both the variables 
so as to produce a normal distribution. 


7. The coefficient of determination (r°), the square of the coefficient of 
correlation (r), is amore precise measure of the strength of the relationship 
between the two variables and lends itself to more precise interpretation 
because it can be presented as a proportion or as a percentage. 


5.8 SUMMARY 


e Correlation analysis is the statistical tool generally used to describe the degree 
to which one variable is related to another. The relationship, if any, is usually 
assumed to be a linear one. This analysis is used quite frequently in 
conjunction with regression analysis to measure how well the regression 
line explains the variations of the dependent variable. 


e Correlation can either be positive or it can be negative. Whether correlation 
is positive or negative would depend upon the direction in which the variables 
are moving. 


e Non-linear correlation is also known as curvilinear correlation. The distinction 
is based upon the constancy of the ratio of change between the variables. 


e Least squares method of fitting a line (the line of best fit or the regression 
line) through the scatter diagram is amethod which minimizes the sum of the 
squared vertical deviations from the fitted line. 


e There is a linear relationship between the two variables which means that a 
straight line would be obtained if the observed data is plotted on a graph. 


e Thetwo variables are causally related which means that one of the variables 
is independent and the other one is dependent. 


e If observations on two variables are given in the form of ranks and not as 
numerical values, it is possible to compute what is known as rank correlation 
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e Coefficient of determination (7°) which is the square of the coefficient of 
correlation (r) is a more precise measure of the strength of the relationship 
between the two variables and lends itself to more precise interpretation 
because it can be presented as a proportion or as a percentage. 


5.9 KEY WORDS 


e Correlation analysis: It is the statistical tool to describe the degree to 
which one variable is related to another. 


e Scatter diagram: It is a graph of observed plotted points where each 
point represents the values of X and Yas a coordinate. 


e Coefficient of determination: It can be defined as the proportion of the 
variation in the dependent variable Y that is explained by the variation in 
independent variable X. 


e Rank correlation: In this type of correlation, observations on two variables 
are given in the form of ranks instead of numerical values. Rank correlation 
is a descriptive index of agreement between ranks over individuals. 


5.10 SELF-ASSESSMENT QUESTIONS AND 
EXERCISES 


Short-Answer Questions 


1. What is a scatter diagram? 
2. Under what conditions will you say that a correlation is linear? 


3. How does a scatter diagram help in studying the correlation between two 
variables? 


4. List the different types of correlation. 


5. Define correlation analysis. 
Long-Answer Questions 


1. Obtain the estimating equation by the method of least squares from the 


following information: 
X Y 
(Independent variable) (Dependent variable) 
2 18 
4 12 
5 10 
6 8 
8 7 
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Correlation 2. Find out the coefficient of correlation between the two kinds of assessment 
of M.A. students’ performance. 


(i) By adopting Karl Pearson’s method 


NOTES (ii) By the method of least squares 
S.N. of Internal assessment External assessment 
Students (Marks obtained (Marks obtained 
out of 100) out of 100) 
1 51 49 
2 63 72 
3 B 4 
4 4 
5 50 58 
6 60 66 
7 47 50 
8 36 30 
9 60 35 
Also, work out r, and interpret the same. 
3. Calculate correlation coefficient from the following results: 
n = 10; XX= 140; % Y= 150 
X“(X— 10)? = 180; ¥(¥— 15)? = 215 
(X— 10) (Y—15) = 60 
4. Given is the following information: 
Observation Test score Sales (,000 Rs) 
X Y 
1 B 450 
2 78 490 
3 92 570 
4 6l 380 
5 87 540 
6 81 500 
7 77 480 
8 70 430 
9 65 410 
10 82 490 
Total 766 4740 


You are required to: 
(i) Graph the scatter diagram for the given data. 


(i) Find the regression equation and draw the line corresponding to the 
equation on the scatter diagram. 


(iii) Make an estimate of sales if the test score happens to be 75. 
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5. Calculate correlation coefficient and the two regression lines for the following Correlation 
information: 


Ages of Wives (in years) 


10-20 20-30 30-40 40-50 Total NOTES 
Ages of 10-20 20 26 — — 46 
Husbands 20-30 8 14 37 — 59 
(in 30-40 — 4 18 3 25 
years) 40-50 — — 4 6 10 
Total 28 44 59 9 140 


6. To know what relationship exists between unemployment and suicide 
attempts, a sociologist surveyed twelve cities and obtained the following 
data: 


S.N. of the Unemployment rate Number of suicide attempts 

city per cent per 1000 residents 
1 73 22 

2 64 17 

3 62 9 

4 55 8 

5 64 12 

6 47 5 

7 58 7 

8 79 19 

9 6.7 13 

10 9.6 29 

11 10.3 33 

12 72 18 


(i) Develop the estimating equation that best describes the given 
relationship. 

(ii) Finda prediction interval (with 95% confidence level) for the attempted 
suicide rate when unemployment rate happens to be 6%. 
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UNIT 6 DISTRIBUTIONS 


Structure NOTES 
6.0 Introduction 
6.1 Objectives 
6.2 Binomial Distribution 
6.3 The Poisson Distribution 
6.4 Limiting Form of Binomial and Poisson Fitting 
6.5 Discrete Theoretical Distributions 
6.6 Answers to Check Your Progress Questions 
6.7 Summary 
6.8 Key Words 
6.9 Self-Assessment Questions and Exercises 
6.10 Further Readings 


6.0 INTRODUCTION 


Binomial distribution is used in finite sampling problems where each observation is 
one of two possible outcomes (‘success’ or “failure’). Poisson distribution is used 
for modelling rates of occurrence. Exponential distribution is used to describe 
units that have a constant failure rate. The term ‘normal distribution’ refers to a 
particular way in which observations will tend to pile up around a particular value 
rather than be spread evenly across a range of values, i.e., the Central Limit Theorem 
(CLT). It is generally most applicable to continuous data and is intrinsically 
associated with parametric statistics (For example, ANOVA, t-test, regression 
analysis). Graphically, the normal distribution is best described by a bell-shaped 
curve. This curve is described in terms of the point at which its height is maximum, 
i.e., its mean and its width or standard deviation. 


In this unit, you will study about the binomial distribution, Poisson distribution, 
limiting form of binomial and Poisson fitting and discrete theoretical distributions. 


6.1 UNIT OBJECTIVES 


After going through this unit, you will be able to: 
e Understand about the binomial distribution 
e Analyse the Poisson distribution 
e Discuss the limiting form of binomial and Poisson fitting 


e Explain the discrete theoretical distributions 
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6.2 BINOMIAL DISTRIBUTION 


Binomial distribution (or the Binomial probability distribution) is a widely used 
probability distribution concerned with a discrete random variable and as such is 
an example ofa discrete probability distribution. The binomial distribution describes 
discrete data resulting from what is often called as the Bernoulli process. The 
tossing ofa fair coin a fixed number of times is a Bernoulli process and the outcome 
of such tosses can be represented by the binomial distribution. The name of Swiss 
mathematician Jacob Bernoulli is associated with this distribution. This distribution 
applies in situations where there are repeated trials of any experiment for which 
only one of two mutually exclusive outcomes (often denoted as ‘success’ and 
‘failure’) can result on each trial. 


Bernoulli Process 
Binomial distribution is considered appropriate in a Bernoulli process which has 
the following characteristics: 


(a) Dichotomy. This means that each trial has only two mutually exclusive 
possible outcomes, for example, ‘Success’ or ‘failure’, ‘Yes’ or ‘No’, 
‘Heads’ or ‘Tails’ and the like. 


(b) Stability. This means that the probability of the outcome of any trial is 
known (or given) and remains fixed over time, i.e., remains the same for all 
the trials. 


(c) Independence. This means that the trials are statistically independent, i.e., 
to say the happening of an outcome or the event in any particular trial is 
independent of its happening in any other trial or trials. 


Probability Function of Binomial Distribution 


The random variable, say X, in the Binomial distribution is the number of ‘successes’ 
inn trials. The probability function of the binomial distribution is written as under: 


fan = "Cp 
r =0, 1, 2...n 
Where, n = Numbers of trials. 
p = Probability of success in a single trial. 
q = (1 — p) = Probability of ‘failure’ in a single trial. 


r = Number of successes in ‘n’ trials. 


Parameters of Binomial Distribution 


This distribution depends upon the values ofp and n which in fact are its parameters. 
Knowledge of p truly defines the probability of X since n is known by definition of 


the problem. The probability of the happening of exactly r events in n trials can be 
found out using the above stated binomial function. 


The value of p also determines the general appearance of the binomial 
distribution, if shown graphically. In this context the usual generalizations are: 


(a) When pis small (say 0.1), the binomial distribution is skewed to the right, 
i.e., the graph takes the form shown in Figure 6.1. 


Probability 


No. of Successes 


Fig. 6.1 


(b) When p is equal to 0.5, the binomial distribution is symmetrical and the 
graph takes the form as shown in Figure 6.2. 


Probability 


No. of Successes 


Fig. 6.2 


(c) When pis larger than 0.5, the binomial distribution is skewed to the left and 
the graph takes the form as shown in Figure 6.3. 


Probability 


No. of Successes 


Fig. 6.3 


But if ‘p’ stays constant and ‘n’ increases, then as ‘n’ increases the vertical 
lines become not only numerous but also tend to bunch up together to form a bell 
shape, i.e., the binomial distribution tends to become symmetrical and the graph 
takes the shape as shown in Figure 6.4. 
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Probability 


No. of Successes 
Fig. 6.4 


Important Measures of Binomial Distribution 


The expected value of random variable [1.e., E(X)] or mean of random variable 
(i.e., X) of the binomial distribution is equal to n.p and the variance of random 
variable is equal to n. p. q or n. p. (1 — p). Accordingly the standard deviation 


of binomial distribution is equal to \/n.p.g. The other important measures relating 
to binomial distribution are as under: 


1-2p 


Skewness = 
fn. p.q 


1-6p+6q° 
n.pq 


Kurtosis = 3+ 


When to Use Binomial Distribution 


The use of binomial distribution is most appropriate in situations fulfilling the 
conditions outlined above. Two such situations, for example, can be described as 
follows. 

(a) When we have to find the probability of 6 heads in 10 throws ofa fair coin. 


(b) When we have to find the probability that 3 out of 10 items produced by a 
machine, which produces 8 per cent defective items on an average, will be 
defective. 


Example 1: A fair coin is thrown 10 times. The random variable Xis the number 
of head(s) coming upwards. Using the binomial probability function, find the 
probabilities of all possible values which_X can take and then verify that binomial 


distribution has a mean: x =n.p. and variance: g? =n.p.q 


Solution: Since the coin is fair and so, when thrown, can come either with head 


upward or tail upward. Hence, p (head) = ; and q (no head) = > The required 


probability function is, 
fX= r) = ‘cpg 
r =0, 1, 2...10 


_I| 


The following table of binomial probability distribution is constructed using this 


function. 

X, (Number Probability pr, X, pr, (X,- X (X,- xy (X,- XPD, 

of Heads) 
0 0C pqg = 1/1024 0/1024 -5 25 25/1024 
1 °C, p!g? = 10/1024 10/1024 4 16 160/1024 
2 0C, pg = 45/1024 90/1024 -3 9 405/1024 
3 °C;p>q’? = 120/1024 360/1024 -2 4 480/1024 
4 °C ptg" = 210/1024 840/1024 -1 1 210/1024 
5 Cpe = 252/1024 1260/1024 0 0 0/1024 
6 oC piq’ = 210/1024 1260/1024 1 1 210/1024 
7 Cpl g = 120/1024 840/1024 2 4 480/1024 
8 0C, p? q? = 45/1024 360/1024 3 9 405/1024 
9 0C p’ q! = 10/1024 90/1024 4 16 160/1024 
10 °C, pg? = 1/1024 10/1024 5 25 25/1024 


EX = 5120/1024 
X 


Variance = 0? = 
E (X- X).pr,= 
2560/1024 = 2.5 


The mean of the binomial distribution! is given by n. p. = 10 x ; = 5 and the 


1 


variance of this distribution is equal to n. p. q. = 10 x Tadic 2.5 


2 2 


These values are exactly the same as we have found them in the above table. 


Hence, these values stand verified with the calculated values of the two 
measures as shown in the table. 


Check Your Progress 


1. Explain the Bernoulli process. 
2. Write the probability function of binomial distribution. 

3. Explain the different parameters of binomial distributions. 
4. Explain the important measures of binomial distribution. 


5. Under what circumstances will you use binomial distribution? 


6.3 THE POISSON DISTRIBUTION 


Poisson distribution is also a discrete probability distribution with which is associated 
the name of a Frenchman, Simeon Denis Poisson who developed this distribution. 
This distribution is frequently used in context of Operations Research and for this 
reason has a great significance for management people. This distribution plays an 
important role in Queuing theory, Inventory control problems and also in Risk 


models. 
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Unlike binomial distribution, Poisson distribution cannot be deducted on 
purely theoretical grounds based on the conditions of the experiment. In fact, it 
must be based on experience, i.e., on the empirical results of past experiments 
relating to the problem under study. Poisson distribution is appropriate specially 
when probability of happening of an event is very small (so that q or (1—p) is 
almost equal to unity) and n is very large such that the average of series (viz., n. p.) 
is a finite number. Experience has shown that this distribution is good for calculating 
the probabilities associated with X occurrences in a given time period or specified 
area. 

The random variable of interest in Poisson distribution is number of 
occurrences of a given event during a given interval (interval may be time, distance, 
area, etc.). We use capital X to represent the discrete random variable and lower 
case x to represent a specific value that capital X can take. The probability function 
of this distribution is generally written as under: 


Vale kd 
x! 
x =0, 1, 2... 


Where, A = Average number of occurrences per specified interval.” In other 
words, it is the mean of the distribution. 


f(X, == 


e = 2.7183 being the basis of natural logarithms. 


x = Number of occurrences of a given event. 


Poisson Process 


The distribution applies in case of Poisson process which has following 
characteristics. 


e Concerning a given random variable, the mean relating to a given interval 
can be estimated on the basis of past data concerning the variable under 
study. 


e If we divide the given interval into very very small intervals we will find: 


(a) The probability that exactly one event will happen during the very very 
small interval is a very small number and is constant for every other 
very small interval. 


(b) The probability that two or more events will happen within a very small 
interval is so small that we can assign it a zero value. 


(c) The event that happens in a given very small interval is independent, 
when the very small interval falls during a given interval. 


(d) The number of events in any small interval is not dependent on the 
number of events in any other small interval. 


Parameter and Important Measures of Poisson Distribution 


Poisson distribution depends upon the value of A, the average number of 
occurrences per specified interval which is its only parameter. The probability 
of exactly x occurrences can be found out using Poisson probability function 
stated above. The expected value or the mean of Poisson random variable is 
A and its variance is also A. The standard deviation of Poisson distribution 
is, Sh- 

Underlying the Poisson model is the assumption that if there are on the 
average A occurrences per interval ¢, then there are on the average k A 
occurrences per interval kt. For example, if the number of arrivals at a service 
counted in a given hour, has a Poisson distribution with A = 4, then y, the 


number of arrivals at a service counter in a given 6 hour day, has the Poisson 
distribution A = 24, i.e., 6x4. 


When to Use Poisson Distribution 


The use of Poisson distribution is resorted to those cases when we do not know 
the value of ‘n’ or when ‘n’ can not be estimated with any degree of accuracy. 
In fact, in certain cases it does not make any sense in asking the value of ‘n’. 
For example, the goals scored by one team in a football match are given, it 
cannot be stated how many goals could not be scored. Similarly, if one watches 
carefully one may find out how many times the lightning flashed but it is not 
possible to state how many times it did not flash. It is in such cases we use 
Poisson distribution. The number of death per day in a district in one year due 
to a disease, the number of scooters passing through a road per minute during 
a certain part of the day for a few months, the number of printing mistakes per 
page in a book containing many pages, are a few other examples where Poisson 
probability distribution is generally used. 


Example 2: Suppose that a manufactured product has 2 defects per unit of 
product inspected. Use Poisson distribution and calculate the probabilities of 
finding a product without any defect, with 3 defects and with four defects. 


Solution: If the product has 2 defects per unit of product inspected. Hence, 
A=2. 


Poisson probability function is as follows: 


x -A 
f(X: = x)=- < 
x = 0, 1, 2,... 
Using the above probability function, we find the required probabilities as under: 
Wer 
P(without any defects, i.e., x = 0) = D 
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Distributions = 1.(0. 1 3534) 


= 0.13534 
. ' 2e” 2x 2x 2(0.13534) 
NOTES P(with 3 defects, i.e., x = 3) F al 
= ii = 0.18045 


24e? 2x2x2x2(0.13534) 
4 4x3x2xl 


P(with 4 defects, i.e., x = 4) = 


= — = 0.09023 


Example 3: How would you use a Poisson distribution to find approximately 
the probability of exactly 5 successes in 100 trials the probability of success in 
each trial being p = 0.1? 


Solution: In the question we have been given, 
n = 100 and p = 0.1 
à = np = 100 x 0.1 = 10 


To find the required probability, we can use Poisson probability function as an 
approximation to Binomial probability function as shown below: 


Ier (np) et") 


I(x ~ 3) j x! ~ x! 
10°.e 1° (100000)(0.00005 5.00000 
or P(Y = — 2 X lg 
5 5x4x3x2xl 5x4x3x2xl 
aye, 2 0.042 
bye 
Check Your Progress 


6. What is Poisson distribution? 


7. Where and when will you use Poisson distribution? 


6.4 LIMITING FORM OF BINOMIAL AND 
POISSON FITTING 


Fitting a Binomial Distribution 


When a binomial distribution is to be fitted to the given data, then the following 
procedure is adopted: 
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(a) Determine the values of ‘p’ and ‘q’ keeping in view that. ¥ =n. p. and 
q=(1 —-p). 

(b) Find the probabilities for all possible values of the given random variable 
applying the binomial probability function, viz., 


S&G r) = "Cpg 
r=0, 1, 2,...n 


(c) Work out the expected frequencies for all values of random variable by 
multiplying N (the total frequency) with the corresponding probability. 


(d) The expected frequencies, so calculated, constitute the fitted binomial 
distribution to the given data. 
Fitting a Poisson Distribution 


When a Poisson distribution is to be fitted to the given data, then the following 
procedure is adopted: 


(a) Determine the value of A, the mean of the distribution. 


(b) Find the probabilities for all possible values of the given random variable 
using the Poisson probability function, viz., 


p Me? 


x 


f(X: =x) 


x= 0, 1, 2,... 
(c) Work out the expected frequencies as follows: 
DAK = x) 


(d) The result of case (c) above is the fitted Poisson distribution to the given 
data. 


Poisson Distribution as an Approximation of Binomial Distribution 


Under certain circumstances Poisson distribution can be considered as a 
reasonable approximation of Binomial distribution and can be used accordingly. 
The circumstances which permit all this, are when ‘n’ is large approaching to 
infinity and p is small approaching to zero (n = Number of Trials, p = Probability 
of ‘Success’). Statisticians usually take the meaning of large n, for this purpose, 
when n 2 20 and by small ‘p’ they mean when p < 0.05. In cases where these 
two conditions are fulfilled, we can use mean of the binomial distribution (viz., 
n.p.) in place of the mean of Poisson distribution (viz., A) so that the probability 
function of Poisson distribution becomes as stated below: 
(n.py eP) 

X 
We can explain Poisson distribution as an approximation of the Binomial 
distribution with the help of following example. 


F(X, =2)= 
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Example 4: Given is the following information: 
(a) There are 20 machines in a certain factory, i.e., n = 20. 
(b) The probability of machine going out of order during any day is 0.02. 


What is the probability that exactly 3 machines will be out of order on the same 
day? Calculate the required probability using both Binomial and Poissons 
Distributions and state whether Poisson distribution is a good approximation of 
the Binomial distribution in this case. 


Solution: Probability, as per Poisson probability function (using n.p in place 
of A) 

(since n = 20 and p < 0.05) 
= 


s=) EE 


x! 


Where, x means number of machines becoming out of order on the same day. 


(20 x 0.02} e 70) 
3 


P(X, = 3) = 


_ (0.4)°.(0.67032) _ (0.064)(0.67032) 
3x2xl 6 


= 0.00715 
Probability, as per Binomial probability function, 
AX, = 1) = "C, pa 


Where, n = 20, r = 3, p = 0.02 and hence g = 0.98 
fX, =3)= 20C, (0.02) (0.98)!7 
= 0.00650 


The difference between the probability of 3 machines becoming out of order on 
the same day calculated using probability function and binomial probability 
function is just 0.00065. The difference being very very small, we can state that 
in the given case Poisson distribution appears to be a good approximation of 
Binomial distribution. 


Example 5: How would you use a Poisson distribution to find approximately 
the probability of exactly 5 successes in 100 trials the probability of success in 
each trial being p = 0.1? 


Solution: In the question we have been given, 
n = 100 and p = 0.1 
à = np = 100 x 0.1 = 10 


To find the required probability, we can use Poisson probability function as an 
approximation to Binomial probability function, as shown below: 


ll 


Distributions 


Areh i (n.p) oe") 


f(X: j 3) Š x! x! 
5 -10 
Or, Psy’ = 10°.e!° _ (100000)(0.00005) _ 5.00000 NOTES 
5 5x4x3x2x1 5x4x3x2xl 
ae Me 0.042 
oe 
Check Your Progress 


8. How can we measure the area under the curve? 


9. Under what circumstances, is a Poisson distribution considered as an 
approximation of binomial distribution? 


6.5 DISCRETE THEORETICAL DISTRIBUTIONS 


Theoretical Distributions: Ifa certain hypothesis is assumed, it is sometimes 
possible to derive mathematically, what the frequency distributions of certain 
universes should be. Such distributions are called theoretical distributions. 


Binomial Distribution 


Binomial distribution was discovered by James Bernoulli in the year 1700. 

Let there be an event the probability of its success" is P and the probability 
of its failure is Q is one trial, soP+O= 1 

Consider a set ofn independent trials and the probability P of success is the 
same in every trial, then Q = 1 — P is the probability of failure in any trial. 


Let the set ofn trials be repeated N times, where N is a very large number. 
Out of these N, there will be sets with few success and also with number of 
successes and so on. 

Now the probability that the first k trials are successes and the remaining 
(n—k) trial are failures is POr™, 

Since k can be chosen out of n-trials in "c, ways, the probability of 
k-successes, P(k) ina series of n-independent trials is given by, 

P(k) ="c, Peer? 

The probability distribution of the number of successes, so obtained is called 
the ‘Binomial probability distribution’. 

The probabilities of 0, 1, 2, ... n successes are "C,P°Q", "C P'O", 
..."C_P"Q°, are the successive terms of binomial expansion (Q + PY". 
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Distributions Definition: A random variable Xis said to follow binomial distribution ifit assumes 
only non-negative values and its probability mass function is given by, 


Toep OTE S012 OT AP 
NOTES P(X = k) = P(A) = 


; Otherwise 
Usually we use the notation X ~ B(n, P) to denote that the random variable X 
follows binomial distribution with parameters and P. 


Notes: 1. Since n and P are independent constants in the binomial distribution , 
they are called the parameters of distribution. 


2. 5" P) => "G.P*.o** =(9+PyY=1 


k=0 k=0 


3. If the experiment consisting of n-trials is repeated n times, then the 
frequency function of the binomial distribution is given by, 


fk) =N.P(D=N. [nC P.O"); k= 0, 1, 2, «. 


and the expected frequencies of 0, 1, 2, ...n successes are the successive terms of 
the binomial expansion, M(Q+ P)";(Q+ P= 1) 


Moments 


The first four moments about origin of binomial distribution are obtained as follows: 
(i) Mean or First Moment about Origin 


i’ =20)' => 2" CP".0"* 


x=0 


-nP Cu Poo 


= nP[Q" + "1C P.Q"? +... + P| 
=nP(Q+ P)*'=nP asP+Q=1 


So, meanorw’, =nP. 
(ii) Second Moment about Origin 
n 
2. = 
w= EX) = > x” CPO 
k=0 
n 


= ¥[(xt+x(e-1)"C,.P*.0"* | 


x=0 
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n 

D aa S Ca a 

x=0 x=0 


n 


$ Si NOTES 
arene DE Cask a 
x=2 


=nP + n(n- D)P.(Q —P)"? 
u, =nPQ+nRP. 
(iii) Third Moment about Origin 


West: = oro 


k=0 
On simplifying, we get, 
u’, =n(n—1)(n—2)P?+ 3n(n —1)P?+nP. 
(iv) Fourth Moment about Origin 
n 
w= E(X’) = > x Caos 
k=0 


And on simplifying, we get, 
u’, =n(n— 1) (n—2) (n—3)P*+ 6n(n —1)(n —2)P? 
+ Tn(n —1)P?+nP. 
Now, the moment about ‘Mean of Binomial Distribution will be Discussed’. 


(i) First Moment about Mean 
u, =0 (always) 


(ii) Second Moment about Mean 
HW, =w- UY EY) -EWF 
=nPQ + P -n P= nP 
Thus, Var (X) =u,=nPQ and standard deviation o(X) = 4 nPQ 
(iii) Third and Fourth Moment about Mean 


(a) u, = Wy Spy + 2U 
=nPQ(Q-P) (On simplification) 
(b) H, = Wy 4p) + OSUD — 2D 


= 3P Q — 6nP Q+ nPO 
= nPO[1 + 3(n-2)PQ] 


E a =P) 
Hence, B, = —— = = 


u2 nP? nPQ 
m4 _nPOLl+3n-2)PO]_,  1-6PO 

Bee ee wae 
u2 nP Q n. Q Self-Instructional 
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O-P 1- 6PQ 
y, = vbi ~ PO and y, =B, -3 = a9 


Moment Generating Function of Binomial Distribution: Letx have a binomial 
distribution with probability function, 
Pœ) ="C PO; x=0,1,2,..n 


The moment generating function about the origin is given as 


M (D) =Ele")= J e""C, PQ" 
x=0 


NOTES 


as y "CY (Pey. Q" * 
M(t) =(Q+ Pey 


Moment generating function about mean nP is given by, 
M At) = EJ gen] _ E( e”. ew) 
=e" Ele’) = "MM 
1a PO + e. PY 
= (Q. elt + Pell -Peyn 
M At) =(Q.e" + P.e2)" 


Poisson Distribution with Mean and Variance 


The Poisson distribution is a limiting case of binomial distribution when the probability 
of success or failure (i.e., P or Q) is vary small and the number of trial n is very 
large (i.e. n — œ) enough so that nP is a finite constant say A i.e. nP =A. Under 
these conditions, P(x) the probability of x success in the binomial distribution, 


P(x) =P(X—x) ="C,P*Q"™ can be written as, 


ee oe A X A n-X 
i) -— 7 (3) (i z) 


Using Lim n — œ% , we have, 


x n 1 
P(x) = Rae ee) Page — a ae (6.1) 
x! n>% nj n>% KAS 
on-ay(1-*) n” 
n 


Self-Instructional 
146 Material 


ANa ah , ay 
We know that Lim(1-Ż] = e^ and Lim(1—*) = 


no n noo n 


And using Stirling’s formula for n!, we have, 


ni = V2n.n™ e” 


n! Jinn t? e™ 


(n = x)! [ann = xy ee 


n! n” +12 on 
So, = = n—-x+1/2 -n+x x 

(n= x)! (n-x) e n 

Thus, Equation (6.1) becomes, 
AY x n” t12 e” 

P(x) = —e “Lim ae -n+x xX 
x! n>% (n — x) e n 
a —À , n” t2 on 
= m n=x+1/2 -n x 


Nea ak n-x+1/2 
= .Lim 
xle” n>% PEES 
p” rtl/2 í = z) 2 
n 
x o —À 
_ Xr” .e Lim 1 
xle* n>% n -yit 
ee 
n) n 
We” 4 Res 
xhe™ e*l x! 
Thus, 
Ke: —À 
P(X =x) = P(x) = = ;x=0, 1,2,... 
x! 


When n > œ, nP = À and P —> 0 


Here, À is known as the parameter of Poisson distribution. 


Definition: A random variable X is said to follow a Poisson distribution if 
it assumes only non-negative values and its probability mass function is 


given by, 


1 
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x o —À 
P(x, A) = P=) E ;x=0,1,2,..,A>0. 


Constants of the Poisson Distribution: 

(i) Mean, y =A=E(X). 

@) w, =E(X*)=A4+2X. 

Gi) w =E(X*)=A3 43041. 

(iv) wi, =E(X*) =A4+ 6034+ 7207 +À. 

(v) First moment about mean p, = 0. 

(vi) VX) == BO?) - [ECOP pyp? 
=A+N-WVE=A 

Note that, Mean = Variance = À. 
(vii) Standard deviation o(X) = VA . 
(viii) Į}, i.e. the third moment about mean 
Be. MS 

= (A3 +32 +2) -3AA +A) + 200 =). 

(ix) u, =p,- 4, + Opp? — 3pi*= 307+ 2. 

Thus, coefficients of skewness and kurtosis are given by, 


2 42 
u À l 1 
he pa Aea 


TE Yi 
ey) ere = 2s, oh 
Also, B, = S 7 and y, = B,- 3 = a 
Hence, the Poisson distribution is always a skewed distribution if Lim , we get, 
Ao 


B, =0,B,=3. 


Negative Binomial Distribution: Suppose there are -trials of an event. We 
assume that 


(i) The n-trials are independent. 
(ii) Pis the probability of success which remains constant from trial to trial. 


Let f(x; r, P) denote the probability that there are x failures preceeding the 
rth success in (x + r) trials. 


Now, in (x +7) trials the last trial must be a success with probability P. Then 
in the remaining (x + r— 1) trials, we must have (r— 1) successes whose probability 
is given by 

C A , pr! f Q. 

Therefore, the compound probability theorem fx; r, P) is given by the 

product of these two probabilities. 


So fær, P) =*€C_, P. O.P 
EG aP O 
Moment Generating Function of Poisson Distribution 


Let P(X=x) eo = ;x=0, 1,2, ...22; à> 0 be a Poisson distribution. 
Then the moment generating function is given by, 
ice) fg whe KF 
MÒ =E= LAL 
x=0 j 
a & Ae) 
2 x! 


=e A o =e 1) 


and moment generating function about mean is, 


M (À = E( ef) = en, E( e") 
= en, eel) 
So, M,(t) = et +r — 2) 


Definition: A random variable Xis said to follow a negative binomial distribution, 
if its probability mass function is given by, 


PXX=x)= PQ) =C. P. O; x=0, 1,2, ... 


0; otherwise. 
Also, we know that "C ="C 
So IG.. = trl C, 
_ (x+r—-l(x+r-2)..(r+)r 


x! 


(-1)* (=r) (=r 1)... (r +x +2) (r +x +1) 
x! 


=(-1%¥. -C 


x 


ECD. p (-4)"; x= 0, l, 2, aad 


a9 POJ 5 P ; otherwise 

which is the (x + 1)th term in the expansion of P’(1 — QJ”, a binomial expansion 
with a negative index. Hence, the distribution is called a negative binomial 
distribution. Also, 


o0 


Pe) =P "Coy =P X= 0) = 1 
x=0 x=0 
Therefore, P(x) represents the probability function and the discrete variable 
which follows this probability function is called the negative binomial variable. 
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Example 6: A continuous random variable X has a probability distribution function 
Aix) =3x*°,0S x1. 

Find a and b such that 

(i) p{X<a}= p{X>a}, and 

(ii) p{X>b}=0.05 
Solution: (i) Since P{X <a} = P{X>a} 


each must be equal to > because total probability is always 1. 


PIXSa} => > [fae = 5 


ols 


o 3l 2 
1/3 
O, @ =l> a= (>) 
2 2 
(ii) p{X<b}  =0.05. 
1 1 
= = J fax = 0.05 = 3[x°dr = 0.05 
b b 


3 1 
> P =0.0531-B=1 
3 20 


b 


1/3 
» 
20 


Example 7: A probability curve y = f(x) has a range from 0 to æ. If 
fix) = e~ find the mean and variance and the third moment about mean. 


Solution: We know that, the rth moment about origin, 
uw’ = [x fae = f. e “dx 
0 0 


= T(r+1l) =r! (Using Gamma Integral) 


Substituting, r = 1, 2, and 3, we have, 
Mean, u’ =l!=1 
u, =2!=2 andu; =3!=6 
Thus, variance = u, = p⁄- (4) =2-1=1 
And p, = — 3). u’ + 2(u’)? =6-3 x 2+2=2 is the required third 
moment about mean. 


Uniform Distribution Distributions 
A random variable X is said to have a continuous uniform distribution over an 

interval (a, b) ifits probability density function is constant say k, over the entire 

range of X. NOTES 

k; a<X<b 

0; otherwise 


Thatis, f(x) = | 


Since total probability is always unity, we have, 


[-f@dr=1 3 k= [dx=] 


1 


Or, k = 
b-a 


; a<X<b 
Thus, f(x) = )o-a 


0 ; otherwise 


This is also known as rectangular distribution as the curve y = f(x) describes 
a rectangle over the x-axis and between the ordinates at x = a and x =b. 


The distribution function F(x) is given by, 


0 ;if-w<x<a 
x-a 

F(x) = faa a<xx<b 
le b<x<@ 


Since F(x) is not continuous at x = a and x = b, it is not differentiable at 


1 l 
these points. Thus, £ F(x)= f (x)= n #0, exists everywhere except at 
x -a 


the points x = a and x = b and consequently probability distribution function fx) 
is given by, 


; a<x<b 
a 


i) =)? 


0 ; otherwise 


I I 
I 1 
I I 
I I 
I 1 
b b 


upþp----- 


a 


Fig. 6.5 Uniform Distribution Self-Instructional 
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Distributions Moments of Uniform Distribution: 
ä 1 b” +1 a r+l 
= -perotis | 
NOTES In particular, 


; 1 [b-a b+a 
TEE peg o a AS 
Lhe ear 
And p's - 2 = J-$ereabeat 
Variance=p, = p', — (M'D)? 


2 
-16 +ab +a) -(24) 
3 2 


1 2 
= — (b-a 
E (b-a) 
Moment generating function is given by, 
b; j elt ~ et! 
Mi(th= | e f(x) d&= 
O= | kf ora 
And the characteristic function is given by, 
ibt iat 
b itx EnG 
th= |e x) dx = ———_ 
b= (eS) dr=— 
Check Your Progress 


10. Who discovered binomial distribution? 


11. When is a random variable X said to follow binomial distribution? 


12. What is Poisson distribution? 


6.6 ANSWERS TO ‘CHECK YOUR PROGRESS’ 


1. Bernoulli process or Binomial distribution is considered appropriate and has 
the following characteristics; 

(a) Dichotomy: This means that each trial has only two mutually exclusive 
possible outcomes. For example, success or failure, yes or no, heads or 
tail, etc. 

(b) Stability: This means that the probability of the outcome of any trial is 
known and remains fixed over time, i.e., remains the same for all the 
trials. 
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2. The probability function of binomial distribution is written as under: 


(c) Independence: This means that the trials are statistically independent, Distributions 
i.e., to say the happening of an outcome or the event in any particular trial 
is independent of its happening in any other trial or trials. 


NOTES 
SX=r)= "C, pq" 
r=0,1,2,...n 
Where,n = Numbers of trials. 
p = Probability of success in a single trial. 
q = (1—p)= Probability of failure in a single trial. 
r = Number of successes in n trials. 


. The parameters of binomial distribution are p and n, where p specifies the 


probability of success in a single trial and n specifies the number of trials. 


. The important measures of binomial distribution are: 


1-2 
Skewness = P 
n.p.q 
; 1-6p+6 
Kurtosis = 3 + pq 
n.p.q 


. We need to use binomial distribution under the following circumstances: 


(a) When we have to find the probability of heads in 10 throws ofa fair coin. 


(b) When we have to find the probability that 3 out of 10 items produced by 
a machine, which produces 8% defective items on an average, will be 
defective. 


. Poisson distribution is a discrete probability distribution that is frequently used 


in the context of Operations Research. Unlike binomial distribution, Poisson 
distribution cannot be deduced on purely theoretical grounds based on the 
conditions of the experiment. In fact, it must be based on the experience, i.e., 
on the empirical results of past experiments relating to the problem under 
study. 


. Poisson distribution is used when probability of happening of an event is very 


small and 7 is very large such that the average of series is a finite number. This 
distribution is good for calculating the probabilities associated with X 
occurrences in a given time period or specified area. 


. For measuring the area under a curve, we make use of the statistical tables 


constructed by mathematicians. Using these tables, we can find the area that 
the normally distributed random variable will lie within certain distances from 
the mean. These distances are defined in terms of standard deviations. While 
using the tables showing the area under the normal curve, it is considered in 


Self-Instructional 
Material 153 


Distributions 


154 


NOTES 


Self-Instructional 
Material 


terms of standard variate, which means standard deviations without units of 
measurement and it is calculated as: 


Where, Z = The standard variate or number of standard deviations from X 
to the mean of the distribution. 


X = Value of the random variable under consideration. 
u = Mean of the distribution of the random variable. 
© = Standard deviation of the distribution. 


9. When vis large approaching to infinity and p is small approaching to zero, 
Poisson distribution is considered as an approximation of binomial distribution. 


10. Binomial distribution was discovered by James Bernoulli. 


11. A random variable Xis said to follow binomial distribution if it assumes only 
non-negative values and its probability mass function is given by, 


por=n= =| 


ne POO +k =0,1,2,..0=1-P 


0 ; otherwise 


12. Poisson distribution is a limiting case of binomial distribution when the 
probability of success or failure is very small and the number of trial n is very 
large. 


6.7 


SUMMARY 


The binomial distribution describes discrete data resulting from what is often 
called as the Bernoulli process. The tossing of a fair coin a fixed number of 
times is a Bernoulli process and the outcome of such tosses can be 
represented by the binomial distribution. The name of Swiss mathematician 
Jacob Bernoulli is associated with this distribution. 


The expected value of random variable [i.e., E(X)] or mean of random 
variable (i.e., X) of the binomial distribution is equal to n.p and the variance 
of random variable is equal to n. p. q orn. p. (1—p). 


Unlike binomial distribution, Poisson distribution cannot be deducted on 
purely theoretical grounds based on the conditions of the experiment. In 
fact, it must be based on experience, i.e., on the empirical results of past 
experiments relating to the problem under study. 


Poisson distribution is appropriate specially when probability of happening 
of an event is very small (so that q or (1—p) is almost equal to unity) and n 
is very large such that the average of series (viz., n. p.) is a finite number. 


e Poisson distribution depends upon the value of A, the average number of Distributions 
occurrences per specified interval which is its only parameter. 


e If a certain hypothesis is assumed, it is sometimes possible to derive 
mathematically, what the frequency distributions of certain universes should NOTES 
be. Such distributions are called theoretical distributions. 


e A random variable Xis said to follow binomial distribution if it assumes only 
non-negative values and its probability mass function is given by, 


"co, PEK OC) 3k =0,1,2,..0=1-P 
a ae 0 ; Otherwise 
e The Poisson distribution is a limiting case of binomial distribution when the 
probability of success or failure (1.e., P or Q) is vary small and the number 
of trial n is very large (i.e. n —> œ) enough so that nP is a finite constant say 
A, i.e., nP =X. 


e A random variable X is said to follow a Poisson distribution if it 
assumes only non-negative values and its probability mass function 
is given by 


x o —À 
Po) Apes” £ 
x! 


;x=0,1,2,...3;A>0. 


e A random variable Xis said to have a continuous uniform distribution over 
an interval (a, b) if its probability density function is constant say k, over the 
entire range of X. 


6.8 KEY WORDS 


e Binomial distribution: It is also called as Bernoulli process and is used to 
describe discrete random variable. 


e Poisson distribution: It is used to describe the empirical results of past 
experiments relating to the problem and plays important role in queuing 
theory, inventory control problems and risk models. 


6.9 SELF-ASSESSMENT QUESTIONS AND 
EXERCISES 


Short-Answer Questions 
1. Define probability distribution and probability functions. 
2. Describe binomial distribution and its measures. 


3. How a binomial distribution can be fitted to a given data? 
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. Describe Poisson distribution and its important measures. 


5. Poisson distribution can be an approximation of binomial distribution. 


Explain. 


. When is the Poisson distribution used? 
7. Write the formula for measuring the area under the curve. 


8. Explain the circumstances when the normal probability distribution can be 


used. 


Long-Answer Questions 
1. Given is the following probability distribution: 


X; pr(X)) 
0 1/8 
1 2/8 
2 3/8 
3 2/8 


Calculate the expected value of X, its variance, and standard deviation. 


. A coin is tossed 3 times. Let X be the number of runs in the sequence of 


outcomes: first toss, second toss, third toss. Find the probability distribution 
of X. What values of X are most probable? 


. (a) Explain the meaning of Bernoulli process pointing out its main 


characteristics. 


(b) Give a few examples narrating some situations wherein binomial pr: 
distribution can be used. 


. State the distinctive features of the Binomial, Poisson and Normal probability 


distributions. When does a Binomial distribution tend to become a Normal 
and a Poisson distribution? Explain. 


. Explain the circumstances when the following probability distributions are 


used: 

(a) Binomial distribution 
(b) Poisson distribution 
(c) Normal distribution 


. Certain articles were produced of which 0.5 per cent are defective, are 


packed in cartons, each containing 130 articles. When proportion of cartons 
are free from defective articles? What proportion of cartons contain 2 or 
more defective? 


(Given e°°=0.6065). 


7. The following mistakes per page were observed in a book: 


No. of Mistakes No. of Times the Mistake 
Per Page Occurred 

0 211 
1 90 
2 19 
3 5 
4 0 

Total 345 


Fit a Poisson distribution to the data given above and test the goodness of 
fit. 


8. In a distribution exactly normal, 7 per cent of the items are under 35 and 89 
per cent are under 63. What are the mean and standard deviation of the 
distribution? 

9. Assume the mean height of soldiers to be 68.22 inches with a variance of 
10.8 inches. How many soldiers in a regiment of 1000 would you expect to 
be over six feet tall?. 
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UNIT 7 SPECIAL DISTRIBUTIONS 


Structure 
7.0 Introduction 
7.1 Objectives 
7.2 The Gamma Distribution 
7.3 Chi-Square Distribution 
7.4 The Normal Distribution 
7.5 The Bivariate Normal Distribution 
7.6 Answers to Check Your Progress Questions 
7.7 Summary 
7.8 Key Words 
7.9 Self-Assessment Questions and Exercises 
7.10 Further Readings 


7.0 INTRODUCTION 


Any statistical hypothesis test, in which the test statistic has a Chi-square distribution, 
when the null hypothesis is true, is termed as Chi-square test. Chi-square test is a 
non-parametric test of statistical significance for bivariate tabular analysis, also 
known as cross-breaks. Amongst the several tests used in statistics for judging the 
significance of the sampling data, Chi-square test, developed by Prof. Fisher, is 
considered an important test. Chi-square, symbolically written as y? (pronounced 
as Ki-square), is a statistical measure with the help of which it is possible to assess 
the significance of the difference between the observed frequencies and the expected 
frequencies obtained from some hypothetical universe. Chi-square tests enable us 
to test and compare whether more than two population proportions can be 
considered equal. Hence, it is a statistical test commonly used to compare observed 
data with expected data and testing the null hypothesis, which states that there is 
no significant difference between the expected and the observed result. 


In this unit, you will study about the Gamma distribution, Chi-square 
distribution, the normal distribution and the bivariate normal distribution. 


7.1 OBJECTIVES 


After going through this unit, you will be able to: 
e Understand about the Gamma distribution 
e Explain various Chi-square distribution 
e Describe the normal distribution 


e Analyse the bivariate normal distribution 


7.2 THE GAMMA DISTRIBUTION 


The Erlang distribution is a continuous probability distribution with wide 
applicability primarily due to its relation to the Exponential and Gamma distributions. 
The Erlang distribution was developed by A. K. Erlang. He developed the Erlang 
distribution to examine the number of telephone calls which might be made at the 
same time to the operators of the switching stations. This work on telephone traffic 
engineering has been expanded to consider waiting times in queuing systems in 
general. Erlang distribution is now used in the fields of stochastic processes and 
biomathematics. 


Givena Poisson distribution with a rate of change A, the Distribution Function 
D (x) giving the waiting times until the Ath Poisson event is: 


Sin (Ax)! 
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For x € (0, œ), where I(x) is a complete gamma function, and I (a, x) an 
incomplete gamma function. With A explicitly an integer, this distribution is known 
as the Erlang distribution and has the following probability function: 
h-1 
P(x) = MAX) e* 
(h-1)! 
It is closely related to the gamma distribution, which is obtained by letting a 
= h (not necessarily an integer) and defining 9= 1/À . When h= 1, it simplifies to 
the exponential distribution. 


The probability density function of the Erlang distribution is given below: 


Where, I (k) is the gamma function evaluated at k, the parameter kis called 
the shape parameter and the parameter A is called the rate parameter. 


An alternative but equivalent parameterization (Gamma distribution) uses 
the scale parameter u, which is the reciprocal of the rate parameter (i.e., u= 1/A): 


x 


k=l 
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When the scale parameter u equals to 2, the distribution simplifies the Chi- 
square distribution with 2k degrees of freedom. It can, therefore, be regarded as 
a generalized Chi-squared distribution for even numbers of degrees of freedom. 
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Because of the factorial function in the denominator, the Erlang distribution 
is only defined when the parameter kis a positive integer. In fact, this distribution 
is sometimes called the Erlang-k distribution (for example, an Erlang-2 distribution 
is an Erlang distribution with k= 2). The Gamma distribution generalizes the Erlang 
distribution by allowing k to be any real number, using the Gamma function instead 
of the factorial function. 


The Cumulative Distribution Function (CDF) of the Erlang distribution is 
given below: 


y(k,àx) 


FKN = 


Where, Y ( ) is the lower incomplete gamma function. The CDF may also 
be expressed as follows: 
k-1 1 
F(x;k,) =1- ye (Ax)". 
n=0 M: 
An asymptotic expansion is known for the median of an Erlang distribution, 
for which coefficients can be computed and bounds are known. 


Generating Erlang Distributed Random Numbers 


Erlang distributed random numbers can be generated from uniform distribution 
random numbers (U € (0, 1)) using the following formula: 


k 
E(k,A) © -> in] TU, 
i=l 


Waiting Times 


Events that occur independently with some average rate are modeled with a Poisson 
process. The waiting times between k occurrences of the event are Erlang 
distributed. 


The Erlang distribution, which measures the time between incoming calls, 
can be used in conjunction with the expected duration of incoming calls to produce 
information about the traffic load measured in Erlang units. This can be used to 
determine the probability of packet loss or delay, according to various assumptions 
made about whether blocked calls are aborted (Erlang B formula) or queued until 
served (Erlang C formula). The Erlang B and C formulae are still in everyday use 
for traffic modelling for applications, such as the design of call centers. 


A.K. Erlang worked a lot in traffic modelling. Thus, there are two other 
Erlang distributions which used in modelling traffic. They are given below: 


e Erlang B Distribution: This is the easier of the two distributions and can 
be used in a call centre to calculate the number of trunks one need to carry 
a certain amount of phone traffic with a certain ‘target service’. 


e Erlang C Distribution: This formula is much more difficult and is often 
used to calculate how long callers will have to wait before being connected 
to a human in a call centre or similar situation. 


Stochastic Processes 


The Erlang distribution is the distribution of the sum of k independent and identically 
distributed random variables each having an exponential distribution. The long-run 
rate at which events occur is the reciprocal of the expectation of X, that is, 1/k. The 
(age specific event) rate of the Erlang distribution is, for k> 1, monotonic in x, 
increasing from zero at x = 0, to l as x tends to infinity. 


Check Your Progress 


1. Define Erlang-k distribution. 
2. What is Erlang C distribution? 


7.3 CHI-SQUARE DISTRIBUTION 


Chi-square test is anon-parametric test of statistical significance for bivariate tabular 
analysis (also known as cross-breaks). Any appropriate test of statistical significance 
lets you know the degree of confidence you can have in accepting or rejecting a 
hypothesis. Typically, the Chi-square test is any statistical hypothesis test, in which 
the test statistics has a chi-square distribution when the null hypothesis is true. It is 
performed on different samples (of people) who are different enough in some 
characteristic or aspect of their behaviour that we can generalize from the samples 
selected. The population from which our samples are drawn should also be different 
in the behaviour or characteristic. Amongst the several tests used in statistics for 
judging the significance of the sampling data, Chi-square test, developed by Prof. 
Fisher, is considered as an important test. Chi-square, symbolically written as %? 
(pronounced as Ki-square), is a statistical measure with the help of which, it is 
possible to assess the significance of the difference between the observed 
frequencies and the expected frequencies obtained from some hypothetical universe. 
Chi-square tests enable us to test whether more than two population proportions 
can be considered equal. In order that Chi-square test may be applicable, both 
the frequencies must be grouped in the same way and the theoretical distribution 
must be adjusted to give the same total frequency which is equal to that of observed 
frequencies. c? is calculated with the help of the following formula: 


j -y {us =f 


Where, fọ means the observed frequency; and 


J, means the expected frequency. 
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Whether or not a calculated value of x? is significant, it can be ascertained by 
looking at the tabulated values of %? (given at the end of this book in appendix 
part) for given degrees of freedom at a certain level of confidence (generally a 
5% level is taken). If the calculated value of x? exceeds the table value, the 
difference between the observed and expected frequencies is taken as 
significant but if the table value is more than the calculated value of %?, then the 
difference between the observed and expected frequencies is considered as 
insignificant, i.e., considered to have arisen as a result of chance and as such 
can be ignored. 


Degrees of Freedom 


As already stated in the earlier unit, the number of independent constraints 
determines the number of degrees of freedom? (or df). If there are 10 frequency 
classes and there is one independent constraint, then there are (10 — 1) =9 
degrees of freedom. Thus, if n is the number of groups and one constraint is 
placed by making the totals of observed and expected frequencies equal, df= 
(n — 1); when two constraints are placed by making the totals as well as the 
arithmetic means equal then df= (n — 2) and so on. In the case of a contingency 
table (i.e., a table with two columns and more than two rows or table with two 
rows but more than two columns or a table with more than two rows and more 
than two columns) or in the case of a 2 x 2 table the degrees of freedom is 
worked out as follows: 


df 
Where, c 


r = Number ofrows 


(c—1)(r-1) 


Number of columns 


II 


Conditions for the Application of Test 


The following conditions should be satisfied before the test can be applied: 
(i) Observations recorded and used are collected on a random basis. 
(ii) All the members (or items) in the sample must be independent. 


(iii) No group should contain very few items say less than 10. In cases 
where the frequencies are less than 10, regrouping is done by combining 
the frequencies of adjoining groups so that the new frequencies become 
greater than 10. Some statisticians take this number as 5, but 10 is 
regarded as better by most of the statisticians. 


(iv) The overall number of items (i.e., N) must be reasonably large. It 
should at least be 50, howsoever small the number of groups may be. 


(v) The constraints must be linear. Constraints which involve linear equations 
in the cell frequencies of a contingency table (i.e., equations containing 
no squares or higher powers of the frequencies) are known as linear 
constraints. 


Areas of Application of Chi-Square Test 


Chi-square test is applicable in large number of problems. The test is, in fact, 
a technique through the use of which it is possible for us to (a) Test the 
goodness of fit; (b) Test the homogeneity of a number of frequency distributions; 
and (c) Test the significance of association between two attributes. In other 
words, Chi-square test is a test of independence, goodness of fit and 
homogeneity. At times Chi-square test is used as a test of population variance 
also. 


As a Test of Goodness of Fit, x? test enables us to see how well the 
distribution of observe data fits the assumed theoretical distribution such as 
Binomial distribution, Poisson distribution or the Normal distribution. 


As a Test of Independence, x’ test helps explain whether or not two 
attributes are associated. For instance, we may be interested in knowing 
whether a new medicine is effective in controlling fever or not and ¥? test will 
help us in deciding this issue. In such a situation, we proceed on the null 
hypothesis that the two attributes (viz., new medicine and control of fever) are 
independent. Which means that new medicine is not effective in controlling 
fever. It may, however, be stated here that %? is not a measure of the degree 
of relationship or the form of relationship between two attributes but it simply 
is a technique of judging the significance of such association or relationship 
between two attributes. 

As a Test of Homogeneity, x? test helps us in stating whether different 
samples come from the same universe. Through this test, we can also explain 
whether the results worked out on the basis of sample/samples are in conformity 
with well defined hypothesis or the results fail to support the given hypothesis. 
As such the test can be taken as an important decision-making technique. 
As a Test of Population Variance. Chi-square is also used to test the 
significance of population variance through confidence intervals, specially in case 
of small samples. 


Steps Involved in Finding the Value of Chi-Square 
The various steps involved are as follows: 
(i) First of all calculate the expected frequencies. 


(ii) Obtain the difference between observed and expected frequencies and 
find out the squares of these differences, i.e., calculate (f, —f,)’. 


(iii) Divide the quantity ( f, — f) obtained, as stated above by the 
(fo - fo)” 


corresponding expected frequency to get ———_. 


Íe 
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= 2 
(iv) Then find summation of L values or what we call 
o 
>, ta] This is the required %? value. 


The x? value obtained as such should be compared with relevant table value of 
x? and inference may be drawn as stated above. 


The following examples illustrate the use of Chi-square test. 


Example 1: A dice is thrown 132 times with the following results: 


Number Turned Up 1 2 3 4 5 6 
Frequency 16 20 25 14 29 28 


Test the hypothesis that the dice is unbiased. 


Solution: Let us take the hypothesis that the dice is unbiased. If that is so, the 
probability of obtaining any one of the six numbers is 1/6 and as such the 


expected frequency of any one number coming upward is 132 x = 22. Now, 


we can write the observed frequencies along with expected frequencies and 
work out the value of x’ as follows: 


fe 
Up Frequency Frequency 
orf) (rf) 
1 16 22 —6 36 36/22 
2 20 22 —2 4 4/22 
3 25 22 3 9 9/22 
4 14 22 —8 64 64/22 
5 29 22 7 49 49/22 
6 28 22 6 36 36/22 


is gel- 5 


Hence, the calculated value of x? = 9 


-: Degrees of freedom in the given problem is (n — 1) = (6-1) =5 


The table value’ of x’ for 5 degrees of freedom at 5% level of significance is 
11.071. If we compare the calculated and table values of yx? we find that 
calculated value is less than the table value and as such could have arisen due 
to fluctuations of sampling. The result thus supports the hypothesis and it can 
be concluded that the dice is unbiased. 


Example 2: Special Distributions 


Find the value of x? for the following information: 


Class Observed A B C D E 
Frequency 8 29 44 15 4 NOTES 
Theoretical (or 
Expected) Frequency 7 24 38 24 7 
Solution: 


Since some of the frequencies are less than 10, we shall first regroup the given 
data as follows and then work out the value of x’: 


Class Observed Frequency Expected Frequency H-I) (fo - FY 


fe 
Gy) fd 
AandB  (8+29)=37 (7424) =31 6 36/31 
C 44 38 6 36/38 
D and E (15+4) = 19 (24+7) =31 -12 144/31 
"x= yee fA |- 6.76 approx. 


Example 3: 


Two research workers classified some people in income groups on the basis of 
sampling studies. Their results are as follows: 


Investigators Middle Total 


160 30 10 200 
140 120 40 300 
Toa | w | 9 | 9| 500 


Show that the sampling technique of at least one research worker is defective. 


Solution: 


Let us take the hypothesis that the sampling techniques adopted by the research 
workers are similar (i.e., there is no difference between the techniques adopted 
by the research workers). This being so, the expectation of A investigator 
classifying the people in, 


(i) Poor income group = 200x300 i20 
500 
: . 200x150 
(ii) Middle income group = = m 60 
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ae 200 x 50 
(iii) Rich income group = =e 20 


Similarly, the expectation of B investigator classifying the people in 


: 300 x 300 
(i) Poor income group =~~ =180 
500 
: . 300x150 
(ii) Middle income group = a an 90 
Bey ees 300 x 50 
(iii) Rich income group = x00 30 


We can now calculate value of x? as follows: 


Groups Observed Expected (f — f) Us =; 
Frequency Frequency 
Ho) (f,) 

Investigator A 
Classifies people as poor 160 120 40 1600/120 = 13.33 
Classifies people as middle class 30 60 -30 900/60 = 15.00 
Classifies people as rich 10 20 -10 100/20 = 5.00 
Investigator B 
Classifies people as poor 140 180 —40 1600/180 = 8.88 
Classifies people as middle class120 90 30 900/90 = 10.00 
Classifies people as rich 40 30 10 100/30 = 3.33 


f 2 
yj ee |- 55.54 


-: Degrees of freedom = (c — 1)(r — 1) 

=(3-1)(2-1)=2 
The table value of x’ for two degrees of freedom at 5% level of significance 
is 5.991. The calculated value of x? is much higher than this table value which 
means that the calculated value cannot be said to have arisen just because of 
chance. It is significant. Hence, the hypothesis does not hold good. This means 
that the sampling techniques adopted by the two investigators differ and are not 


similar. Naturally, then the technique of one must be superior than that of the 
other. 


Alternative Formula for Finding the Value of Chi-Square in a (2 x 2) 
Table 


There is an alternative method of calculating the value of x? in the case of a 
(2 x 2) table. Let us write the cell frequencies and marginal totals in case of 
a (2 x 2) table as follows: 


a b 
NZ 
c Nd (c + d) 

(a+c)(b+d) N 


Then the formula for calculating the value of x? will be stated as follows: 


a (ad -be N 
K (atob+dyat+bie+d) 


Where, N means the total frequency, ad means the larger cross product, be 
means the smaller cross product and (a + c), (b + d), (a + b) and (c + d) 
are the marginal totals. The alternative formula is rarely used in finding out the 
value of Chi-square as it is not applicable uniformly in all cases but can be used 
only in a (2 x 2) contingency table. 


Yates’ Correction 


F. Yates has suggested a correction in %? value calculated in connection with a 
(2 x 2) table particularly when cell frequencies are small (since no cell frequency 
should be less than 5 in any case, though 10 is better as stated earlier) and %? 
is just on the significance level. The correction suggested by Yates is popularly 
known as Yates’ correction. It involves the reduction of the deviation of 
observed, from expected frequencies which of course reduces the value of ¥?. 
The rule for correction is to adjust the observed frequency in each cell of a (2 
x 2) table in such a way as to reduce the deviation of the observed from the 
expected frequency for that cell by 0.5, and this adjustment is made in all the 
cells without disturbing the marginal totals. The formula for finding the value of 
x’ after applying Yates’ correction is written as under: 


N.(ad -bc -0.5 NY 
(a+b)(c+d)\(a+c\(b+d) 


X? (corrected) = 


In case we use the usual formula for calculating the value of Chi-square 


2 
viz., | x’ = 2 {esr then Yates’ correction can be applied as under: 


ital aos) hoo = fez 
Ta T 


It may again be emphasized that Yates’ correction is made only in case of 
(2 x 2) table and that too when cell frequencies are small. 


~0.5] . 


y (corrected) = 


Chi-Square as a Test of Population Variance 


x’ is used, at times, to test the significance of population variance (0, through 
confidence intervals. This, in other words, means that we can use %? test to 
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judge if a random sample has been drawn from a normal population with mean 
(u) and with specified variance (o,). In such a situation, the test statistic for 
a null hypothesis will be as under: 


22y (X, -XY _ n(o)? 
(5, Oy 

By comparing the calculated value (with the help of the above formula) with the 
table value of x? for (n—1) dfat a certain level of significance, we may accept 
or reject the null hypothesis. If the calculated value is equal or less than the table 
value, the null hypothesis is to be accepted but if the calculated value is greater 
than the table value, the hypothesis is rejected. All this can be made clear by 
an example. 


with (n—1) degrees of freedom. 


Example 4: 

Weight of 10 students is as follows: 

Sl. No. 1 2 3 4 5 6 7 8 9 
10 


Weight in kg. 38 40 45 53 471 43 55 48 52 
49 


Can we say that the variance of the distribution of weights of all students from 
which the above sample of 10 students was drawn is equal to 20 square kg? 
Test this at 5% and 1% level of significance. 


Solution: 
First of all, we should work out the standard deviation of the sample (©) 


Calculation of the sample standard deviation: 


SL. No. X, X,-X, (X-XV 
Weight in kg 

1 38 -9 81 

2 40 -7 49 

3 45 = 2 04 

4 53 +6 36 

5 47 +0 00 

6 43 -4 16 

7 55 +8 64 

8 48 +1 01 

9 52 +5 25 

10 49 +2 04 
n=10 EX =470 x (X; -XJ = 280 
aR Any 

s n 10 


E(X. -X F 2 
o, = 1 a A = N28 =5.3 kg 
x n 


(GY = 28 
Taking the null hypothesis as H; (0y =(0,) 


n(o,y _10x28 280 _ 


The test statistic y? = (o.)2 20 20 re 
p 


Degrees of freedom in this case is (n — 1)=10-1=9 


At 5% level of significance, the table value of x? = 16.92, and at 1% level of 
significance it is 21.67 for 9 df, and both these values are greater than the 
calculated value of %? which is 14. Hence, we accept the null hypothesis and 
conclude that the variance of the given distribution can be taken as 20 square 
kg at 5% as well as at 1% level of significance. 


Additive Property of Chi-Square (x) 


An important property of x is its additive nature. This means that several values 
of x? can be added together and if the degrees of freedom are also added, this 
number gives the degrees of freedom of the total value of x”. Thus, if a number 
of x? values have been obtained from a number of samples of similar data, then, 
because of the additive nature of y*, we can combine the various values of x? 
by just simply adding them. Such addition of various values of %? gives one 
value of %? which helps in forming a better idea about the significance of the 
problem under consideration. The following example illustrates the additive 
property of the %?. 

Example 5: The following values of y? are obtained from different 
investigations carried to examine the effectiveness of a recently invented 
medicine for checking malaria. 


Investigation i df 
l 2.5 l 
2 3.2 l 
3 4.1 l 
4 3.7 l 
5 4.5 l 


What conclusion would you draw about the effectiveness of the new medicine 
on the basis of the five investigations taken together? 


Solution: By adding all the values of X’, we obtain a value equal to 18.0. Also 
by adding the various d.f. as given in the question, we obtain a figure 5. We can 
now state that the value of x? for 5 degrees of freedom (when all the five 
investigations are taken together) is 18.0. 


Special Distributions 


NOTES 


Self-Instructional 
Material 


169 


Special Distributions 


170 


NOTES 


Self-Instructional 
Material 


Let us take the hypothesis that the new medicine is not effective. The table value 
of X? for 5 degrees of freedom at 5% level of significance is 11.070. But our 
calculated value is higher than this table value which means that the difference 
is significant and is not due to chance. As such the hypothesis is wrong and it 
can be concluded that the new medicine is effective in checking malaria. 


Important Characteristics of Chi-Square (%°) Test 


(i) This test is based on frequencies and not on the parameters like mean and 
standard deviation. 


(ii) This test is used for testing the hypothesis and is not useful for estimation. 
(iii) This test possesses the additive property. 


(iv) This test can also be applied to a complex contingency table with several 
classes and as such is a very useful test in research work. 


(v) This test is an important non-parametric (or a distribution free) test as no 
rigid assumptions are necessary in regard to the type of population and no 
need of the parameter values. It involves less mathematical details. 


A Word of Caution in Using x? Test 


Chi-square test is no doubt a most frequently used test but its correct application 
is equally an uphill task. It should be borne in mind that the test is to be applied 
only when the individual observations of sample are independent which means 
that the occurrence of one individual observation (event) has no effect upon the 
occurrence of any other observation (event) in the sample under consideration. 
The researcher, while applying this test, must remain careful about all these things 
and must thoroughly understand the rationale of this important test before using it 
and drawing inferences concerning his hypothesis. 


Check Your Progress 


3. What is a chi-square test? 
4. What do you mean by degrees of freedom? 


5. What conditions should be satisfied before the application of Chi-square 
test? 


6. What are the areas of application in which Chi-square test is applied? 
7. What do you mean by goodness of fit? 
8. What are the steps involved in Chi-square test? 
9. Is there any alternative formula to find the value of Chi-square? 
10. What is Yates’ correction method? 


11. Explain the additive property of Chi-square. 


12. What are the important characteristics of Chi-square test? 


7.4 THE NORMAL DISTRIBUTION 


Among all the probability distributions the normal probability distribution is by far 
the most important and frequently used continuous probability distribution. This is 
so because this distribution well fits in many types of problems. This distribution is 
of special significance in inferential statistics since it describes probabilistically the 
link between a statistic and a parameter (i.e., between the sample results and the 
population from which the sample is drawn). The name of Karl Gauss, eighteenth 
century mathematician-astronomer, is associated with this distribution and in honour 
of his contribution, this distribution is often known as the Gaussian distribution. 


The normal distribution can be theoretically derived as the limiting form of many 
discrete distributions. For instance, if in the binomial expansion of (p + q)”, the 


ie ; 1 : 
value of ‘n’is infinity and p =q = 5° then a perfectly smooth symmetrical curve 


would be obtained. Even if the values ofp and q are not equal but if the value of 
the exponent ‘n’ happens to be very very large, we get a curve of normal probability 
smooth and symmetrical. Such curves are called normal probability curves (or at 
times known as normal curves of error) and such curves represent the normal 
distributions.° 


The probability function in case of normal probability distribution’ is given as: 
lf ee " 
ae a o ) 
fl) ON2T i 


Where, u = The mean of the distribution. 


o°= Variance of the distribution. 


The normal distribution is thus defined by two parameters viz., u and o°. This 
distribution can be represented graphically as under: 


360 26 -lo H +1lo +20 +30 


Fig. 7.1 Curve Representing Normal Distribution 


Characteristics of Normal Distribution 


The characteristics of the normal distribution or that of normal curve are, as given 
below: 
1. Itis symmetric distribution.’ 
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Special Distributions 2. The mean u defines where the peak of the curve occurs. In other words, the 
ordinate at the mean is the highest ordinate. The height of the ordinate at a 
distance of one standard deviation from mean is 60.653% of the height of the 
mean ordinate and similarly the height of other ordinates at various standard 

NOTES deviations (6,) from mean happens to be a fixed relationship with the height of 
the mean ordinate. 


3. The curve is asymptotic to the base line which means that it continues to 
approach but never touches the horizontal axis. 


4. The variance (0°) defines the spread of the curve. 


5. Area enclosed between mean ordinate and an ordinate at a distance of one 
standard deviation from the mean is always 34.134% of the total area of the 
curve. It means that the area enclosed between two ordinates at one sigma (S.D.) 
distance from the mean on either side would always be 68.268% of the total 
area. This can be shown as follows: 


(34.134% + 34.134%) = 68.268% 
Area of the total 
curve between u + 1(0) 


X orX 


-30 -20 -o pu +0 +20 +30 


Similarly, the other area relationships are as follows: 


Between Area Covered to Total Area of the 
Normal Curve’ 

ut | S.D. 68.27% 

w+2 S.D. 95.45% 

w+ 3 S.D. 99.73% 

u + 1.96 S.D. 95% 

u+2.578 S.D. 99% 

u + 0.6745 S.D. 50% 


6. The normal distribution has only one mode since the curve has a single peak. In 
other words, it is always a unimodal distribution. 


7. The maximum ordinate divides the graph of normal curve into two equal parts. 
8. In addition to all the above stated characteristics the curve has the following 
properties: 
(a) =x 
(b) u, =0°= Variance 
(c) p530" 
(d) Moment Coefficient of Kurtosis =3 
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Family of Normal Distributions Special Distributions 


We can have several normal probability distributions but each particular normal 

distribution is being defined by its two parameters viz., the mean (u) and the standard 

deviation (6). There is, thus, not a single normal curve but rather a family of normal NOTES 
curves. We can exhibit some of these as under: 


Normal curves with identical means but different standard deviations: 


Curve having small standard 

deviation say (o = 1) 

Curve having large standard 

deviation say (o = 5) 

— Curve having very large standard 
deviation say (o = 10) 


H in anormal 
distribution 


Normal curves with identical standard deviation but each with different means: 


u=15 = 30 u=50 
Curve A with Curve B with mean Curve C with the 
smallest mean between means of largest mean 


curve A and curve C 


Normal curves each with different standard deviations and different means: 


Ik” ake che 


p=5 p=15 uu = 30 


Curve with smaller Curve with larger Curve with very 
mean and smaller mean and larger large mean 
standard deviation standard deviation and very large 


standard deviation 
How to Measure the Area under the Normal Curve? 


We have stated above some of the area relationships involving certain intervals of 
standard deviations (plus and minus) from the means that are true in case of a 
normal curve. But what should be done in all other cases? We can make use of the 
statistical tables constructed by mathematicians for the purpose. Using these tables 
we can find the area (or probability, taking the entire area of the curve as equal to 
1) that the normally distributed random variable will lie within certain distances 
from the mean. These distances are defined in terms of standard deviations. While 
using the tables showing the area under the normal curve we talk in terms of 
standard variate (symbolically Z ) which really means standard deviations without 
units of measurement and this ‘Z’ is worked out as under: 
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Where, Z = The standard variate (or number of standard deviations from Xto the 
mean of the distribution). 


X = Value of the random variable under consideration. 
u = Mean of the distribution of the random variable. 
© = Standard deviation of the distribution. 


The table showing the area under the normal curve (often termed as the standard 
normal probability distribution table) is organized in terms of standard variate (or 
Z) values. It gives the values for only half the area under the normal curve, beginning 
with Z= 0 at the mean. Since the normal distribution is perfectly symmetrical the 
values true for one half of the curve are also true for the other half. We now 
illustrate the use of such a table for working out certain problems. 


Example 6: A banker claims that the life ofa regular saving account opened with 
his bank averages 18 months with a standard deviation of 6.45 months. Answer 
the following: (a) What is the probability that there will still be money in 22 months 
in a savings account opened with the said bank by a depositor? (b) What is the 
probability that the account will have been closed before two years? 


Solution: (a) For finding the required probability we are interested in the area of 
the portion of the normal curve as shaded and shown below: 


The value from the table showing the area under the normal curve for Z= 0.62 is 
0.2324. This means that the area of the curve between u = 18 and X = 22 is 
0.2324. Hence, the area of the shaded portion of the curve is (0.5) — (0.2324) = 
0.2676 since the area of the entire right hand portion of the curve always happens 
to be 0.5. Thus the probability that there will still be money in 22 months ina 
savings account is 0.2676. 

(b) For finding the required probability we are interested in the area of the portion 

of the normal curve as shaded and shown in figure: 
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The value from the concerning table, when Z= 0.93, is 0.3238 which refers to the 
area of the curve between u = 18 and X= 24. The area of the entire left hand 
portion of the curve is 0.5 as usual. 
Hence, the area of the shaded portion is (0.5) + (0.3238) = 0.8238 which is the 
required probability that the account will have been closed before two years, i.e., 
before 24 months. 
Example 7: Regarding a certain normal distribution concerning the income of the 
individuals we are given that mean=500 rupees and standard deviation =100 rupees. 
Find the probability that an individual selected at random will belong to income 
group, 

(a) Rs 550 to Rs 650 (b) Rs 420 to 570 
Solution: (a) For finding the required probability we are interested in the area of 
the portion of the normal curve as shaded and shown below: 


o= 100 


tt = 500 X= 650 
z=0 X=550 
For finding the area of the curve between X= 550 to 650, let us do the following 
calculations: 


7 550 — 500 _ 29 _9 59 
100 100 


Corresponding to which the area between u = 500 and X= 550 in the curve as 
per table is equal to 0.1915 and, 


Z 
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_ 650-500 _ 150 
100 100 


Z =1.5 


Corresponding to which, the area between u = 500 and X= 650 in the curve, as 
per table, is equal to 0.4332. 


Hence, the area of the curve that lies between X= 550 and X= 650 is, 
(0.4332) — (0.1915) = 0.2417 


This is the required probability that an individual selected at random will belong to 
income group of Rs 550 to Rs 650. 


(b) For finding the required probability we are interested in the area of the portion 
of the normal curve as shaded and shown below: 


To find the area of the shaded portion we make the following calculations: 


u= 100 


z=0 
X= 420 X= 570 


Z= 570 — 500 
100 


= 0.70 


Corresponding to which the area between u = 500 and X= 570 in the curve as 
per table is equal to 0.2580. 


And Z= 420-500 _ —0.80 
100 


Corresponding to which the area between u = 500 and X= 420 in the curve as 
per table is equal to 0.2881. 


Hence, the required area in the curve between X= 420 and X= 570 is, 
(0.2580) + (0.2881) = 0.5461 


This is the required probability that an individual selected at random will belong to 
income group of Rs 420 to Rs 570. 


Example 8: A certain company manufactures IF all-purpose rope made from 


imported hemp. The manager of the company knows that the average load-bearing 
capacity of the rope is 200 Ibs. Assuming that normal distribution applies, find the 


1” Special Distributions 
standard deviation of load-bearing capacity for the 1 2 rope if it is given that the 


rope has a 0.1210 probability of breaking with 68 Ibs. or less pull. 


Solution: Given information can be depicted in a normal curve as shown below: NOTES 


Probability of this 
area (0.5) — (0.1210) = 0.3790 


o =? (to be found out) 


Probability of this area 
(68 Ibs. or less) 
as given is 0.1210 


= 200 
X= 68 a 


=0 

Ifthe probability of the area falling within u = 200 and X= 68 is 0.3790 as stated 
above, the corresponding value of Z as per the table’ showing the area of the 
normal curve is — 1.17 (minus sign indicates that we are in the left portion of the 
curve) 


Now to find o, we can write, 


z- nh 
oO 
Or 1.17 = 082200 
oO 


Or -1.170 =-132 
Or o=112.8 lbs. approx. 
Thus, the required standard deviation is 112.8 lbs. approximately. 


Example 9: In a normal distribution, 31 per cent items are below 45 and 8 per 
cent are above 64. Find the X and o of this distribution. 


Solution: We can depict the given information in a normal curve as shown below: 


Probability of the area Probability of the area 
between pt and X= 45 between u and X= 64 
is (0.5) — (0.31) = 0.19 (0.5) — (0.08) = 0.42 


Probability of the 
shaded area as 
given 0.08 


Probability of the 
shaded area as 
given 0.31 


X= 45 p=? x = 64 


X (to be found out) 


If the probability of the area falling within u and X= 45 is 0.19 as stated above, the 
corresponding value of Z from the table showing the area of the normal curve is — 


0.50. Since, we are in the left portion of the curve, we can express this as under, 
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45- u 

z (1) 
Similarly, ifthe probability of the area falling within u and X= 64 is 0.42, as stated 
above, the corresponding value of Z from the area table is, +1.41. Since, we are 
in the right portion of the curve we can express this as under, 


64-u 


—0.50 = 


1.41 =E 
- 2) 
Ifwe solve Equations (1) and (2) above to obtain the value of u or X , we have, 
-0.50 =45-u (3) 
1.41 © =64-u (4) 
By subtracting the Equation (4) from Equation (3) we have, 
-1.91 © = -19 
o = 10 
Putting o= 10 in Equation (3) we have, 
-5 = 45-u 
u = 50 


Hence, X (or .)=50 and o=10 for the concerning normal distribution. 


7.5 THE BIVARIATE NORMAL DISTRIBUTION 


Binomial, Poisson, negative binomial and uniform distribution are some of the 
discrete probability distributions. The random variables in these distributions assume 
a finite or enumerably infinite number of values but in nature these are random 
variables which take infinite number of values i.e. these variables can take any 
value in an interval. Such variables and their probability distributions are known as 
continuous probability distributions. 

A random variable Xis the said to be normally distributed if it has the following 
probability density function: 


, for — œ < x < œ% 


oh 
NO = n° 


where u and © > 0 are the parameters of distribution. 


Normal Curve: A curve given by, 


a) 
2 oO 


Which is known as the normal curve when origin is taken at mean. 


y= Yoe 
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Fig. 7.2 Normal Curve 


Standard Normal Variate : A normal variate with mean zero and standard 
deviation unity, is called a standard normal variate. 

That is; if Xis a standard normal variate then E(X) =0 and V(X) = 1. 

Then, X ~N (0, 1) 

The moment generating function or MGF ofa standard normal variate is 
given as follows: 


ut + : to? x. 
M(t)= e =e 
u=0 
o=l 


Frequently the exchange of variable in the integral: 


l f g- (BY /207 
ov2n “o 


is used by introducing the following new variable: 


A 


Z= ~ N(0,1) 


This new random variable Z simplifies calculations of probabilities etc. 
concerning normally distributed variates. 


X = 
Standard Normal Distribution: The distribution of arandom variable Z = = 


which is known as standard normal variate, is called the standard normal distribution 
or unit normal distribution, where X has a normal distribution with mean u end 
variance 0”. 


The density function of Zis given as follows: 
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with mean O variance one of MGF e? . Normal distribution is the most frequently 
used distribution in statistics. The importance of this distribution is highlighted by 
central limit theorem, mathematical properties, such as the calculation of height, 
weight, the blood pressure of normal individuals, heart diameter measurement, 
etc. They all follow normal distribution if the number of observations is very large. 
Normal distribution also has great importance in statistical inference theory. 


NOTES 


Examples of Normal Distribution: 


1. The height of men of matured age belonging to same race and living in 
similar environments provide a normal frequency distribution. 


2. The heights of trees of the same variety and age in the same locality would 
confirm to the laws of normal curve. 


3. The length of leaves ofa tree form a normal frequency distribution. Though 
some of them are very short and some are long, yet they try to tend towards 
their mean length. 


Example 10: Xhas normal distribution with u = 50 and 0? = 25. Find out 
(1) The approximate value of the probability density function for X= 50 
(ii) The value of the distribution function for x = 50. 
l za (x- u)*/207 
ov27 


— col x Soo, 


’ = p 


Solution: (7) K(x) = 


for X=50,0?=25, w=50, you have 


1 
x) = ——= 0.08. 
fæ) SV2T 
Distribution function fx) 


x 
ani l -N2 Jy 


F(50) = f Sg 2 E 


Example 11: If Xis a normal variable with mean 8 and standard deviation 4, find 
(i) PLXS5] 
(ii) P[IS<X< 10] 
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Solution: (i) PLX < 5] = ( =e =) 
oO 4 
NOTES 
Z>0.75 
X = uZ = 0.75 
k—— 0.5 — 
=P (Z<-—0.75) 
=P (Z2 0.75) 
[By Symmetry] 
=0.5—P(0<Z< 0.75) 
[To use relevant table] 
=0.5— 0.2734 [See Appendix for value of ‘2”] 
= 0.2266. 


GD PISS X< 10] = pE szt) 


= P(- 0.75 < Z < 0.5) 

= P(- 0.75 <Z<0)+P(0< Z< 0.5) 

= P(-0¢SZ¢0.75)+ P(0 < Z< 0.5) 

= 0.2734 + 0.1915 [See Appendix] 
= 0.4649. 


Example 12: Xis anormal variate with mean 30 and S.D. 5. Find 
(i) P[26 <X< 40] 
(ii) PIX —30|> 5] 

Solution: Here u = 30,0 =5. 


P[26 < X < 40] 
X=26X=u  X=40 
Z=-0.8Z=0 7-9 
NX = 
(i) When ¥ =26, Z =-—==-08 
X= 
And for X=40, =F =2 
P[26 <X<40] = P[-0.8<Z<2] 
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= 0.2881 + 0.4772 = 0.7653 
GD P[|X-3|>5] =1-P[|X-3|<5] 
NOTES P[|X-3|<5] =P[25<X<35] 


aS 2) 


=P <Z< 
5 


=2.P(0< Z<1)=0. 
=2 x 0.3413 = 0.6826. 

So P[| X—3|>5] =1-P[|X-3|<5] 
= 1 — 0.6826 = 0.3174. 


Central Limit Theorem 


Let X, X,,.... X, be n independent random variables all of which have the 
same distribution. Let the common expectation and variance be u and ©, 
respectively. 


n 
Let X=) 


Then, the distribution of X approches the normal distribution with mean m 
2 
. o 
and variance — asn — œ% 
n 


X= 
That is, the variate Z = z has standard normal distribution. 
o/Vn 


Proof: Moment generating function of Z about origin is given as follows: 


M,(t) = E(e”) = E AG #) 


= e` Wino pet xno) 


Jn [ iyn (Zitat ot Ha) 


=€ o Eļ| eF k 


L 


T t 
ENE aa eee 
—e o E| eo? 
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g ° | M| SS 
alaan 
(=) 


This is because the random variables are independent and have the same 
MGF by using logarithms, you have: 


F 
svi 


L nog at, | 


log M(t) = 
tNn 7 es eae a 
= a log| tat oe 
o ovn 2! lovn 
= P A 2 1 2 
= miva Mieg Bot A ro (= +) sess 
o ovn 2! no 2\ ovn 
-utn witan wot? pit 
= + + Pia 
o o 20? 20° 
a 1/2 
Too [e u-u’ = OL = y] 


Hence, as n — oe 


2 
t 
log(M)O> 7 ie M)=e"? 


However, this is the M.GF. of a standard normal random variable. Thus, 
the random variable Z converges to N. 


This follows that the limiting distribution of ./ as normal with mean u and 


2 


; (o) 
variance —. 
n 


Normal Approximation 


Under certain circumstance one can use normal distribution to approximate a 
binomial distribution as well as Poisson distribution. 
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IfX ~ B(n, p) and value ofn is quite large with p quite close to 4, then, X 
can be approximated as N (np, npq), where q = 1 —p. 


There are cases in which use of normal distribution is found to be easier 
than that ofa Binomial distribution. 


Also, as already told, normal distribution may also be utilized for 
approximating Poisson distribution when value of À is large. Here, À is the mean 
of Poisson distribution. 


Thus, X ~ Po(A) > X ~ N(A, A) approximately when values of À is large. 
Continuity Correction 
Normal distribution is continuous whereas both the distributions, Binomial as well 
as Poisson are discrete random variables. This fact has to be kept in mind while 


making use of normal distribution for approximating Binomial or Poisson distribution 
and use continuity correction. 


Each probability, in case of discrete distribution, is represented using a 
rectangle as shown in the Figure 7.2(b). 


Fig. 7.2 (a) Continuous Distribution 


Fig. 7.3 (b) Discrete Distribution 
While working out for probabilities, we like inclusion of the whole rectangles 
in applying continuity correction. 


Example 13: A fair coin is tossed 20 times. Find the probability of getting between 
9 and 11 heads. 


Solution: Let the random variable be represented by X that shows the number of 
heads thrown. 


X~Bin(20, %) 


As pis very near to ⁄2, normal approximation can be used for the binomial 
distribution and we may write as X ~ N(20 x 1⁄2, 20 x 2 x %2) = X ~ N(10, 5). 


In the diagram as shown in below rectangles show binomial distribution 
which is discrete and curve shows normal distribution which is continuous in nature. 


ii 10 1112 


8.5 
Using normal distribution for showing Binomial distribution 


If it is desired to have P(9 < X < 11) as shown by shaded area one may 
note that first rectangle begins at 8.5 and last rectangle terminates at 11.5. By 
making a continuity correction, probability becomes P(8.5 < X < 11.5) in normal 
distribution. We may standardize this, as given below: 


B35 5 


=P (—0.447 < Z<0.447) 


- (2a X -10 L 


=2 x 0.67 — 1 (using tables) = 0.34 


Check Your Progress 
13. What is a normal distribution? 
14. Explain any four characteristics of normal distribution. 


15. Define a standard normal variate. 


7.6 ANSWERS TO CHECK YOUR PROGRESS 
QUESTIONS 


1. Because ofthe factorial function in the denominator, the Erlang distribution 
is only defined when the parameter k is a positive integer. In fact, this 
distribution is sometimes called the Erlang-k distribution. 


2. This formula is much more difficult and is often used to calculate how long 
callers will have to wait before being connected to a human ina call centre 
or similar situation. 
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Special Distributions 3. Chi-square test is a non-parametric test of statistical significance for bivariate 
tabular analysis. Chi-square tests enable us to test whether more than two 
population proportions can be considered equal. 


4. The degree of freedom is determined by the number of constraints. If there 
NOTES are 10 frequency classes and there is one independent constraint, then there 
are 10—1=9 degrees of freedom. 


5. Before the application of chi-square test, the following conditions need to be 
satisfied: 
(i) Observations recorded and used are collected on a random basis. 
(ii) Allthe members or items in the sample must be independent. 


(iii) No group should contain very few items say less than 10. In cases where 
the frequencies are less than 10, regrouping is done by combining the 
frequencies of adjoining groups so that the new frequencies become 
greater than 10. 


(iv) The overall number of items must be reasonably large; it should at least 
be 50 howsoever small the number of groups may be. 


(v) The constraints must be linear. Constraints, which involve linear equations 
in the cell frequencies of a contingency table, are known as linear 
constraints. 


6. Chi-square test is applicable in large number of problems. These include: 
(i) Testing the goodness of fit. 
(ii) Testing the homogeneity ofa number of frequency distributions. 
(iii) Testing the significance of association between two attributes. 
(iv) Establishing hypotheses. 
(v) Testing independence between two variables. 
7. Goodness of fit describes that how well the theoretical distribution fits with 
the observed data. 
8. The various steps involved in the chi-square test are: 
(i) Calculation of the expected frequencies. 
(ii) Obtaining the difference between observed and expected frequencies 
and finding out the squares of these differences. 
(iii) Dividing the quantity (fF — f ¥ obtained in the result by the corresponding 


_ 2 
expected frequency to get Od ; 


2 
(iv) Finding summation of n values also called as required %? value, 


(fo - fey 


and is represented as zX 7 
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10. 


— 


12; 


11. 


The alternative method of calculating the value of chi-square is used in the 
case of a (2 x 2) table. The cell frequencies and marginal totals ofa (2 x 2) 
table are written as follows: 


a oP (a +b) 
c NM (c +d) 
(a+c)(b +d) N 


The alternative formula for calculating the value of 7? is: 
i (ad —bc)* -N 
XT (atcy(b+dya+b\(c+d) 
Where, N means the total frequency, ad means the larger cross product, bc 


means the smaller cross product and (a +c), (b+ d), (a + b) and (c + d) are 
the marginal totals. 


F. Yates has suggested a correction in x? value calculated in connection with 
a (2 x 2) table particularly when cell frequencies are small and x? is just on 
the significance level. The correction suggested by Yates is popularly known 
as Yates’ correction. It involves the reduction of the deviation of observed 
from expected frequencies, which of course reduces the value of %?. The 
rule for correction is to adjust the observed frequency in each cell ofa (2 x 2) 
table in such a way as to reduce the deviation of the observed from the 
expected frequency for that cell by 0.5, and this adjustment is made in all the 
cells without disturbing the marginal totals. 


It is an important property of x2. This means that several values of x? can be 
added together and if the degrees of freedom are also added, this number 
gives the degrees of freedom of total value of x2. Thus, if a number of x? 
values have been obtained from a number of samples of similar data, then 
because of the additive nature of x? we can combine the various values of x? 
by just simply adding them. 

The important characteristics of chi-square test are: 


(i) This test is based on frequencies and not on the parameters like mean 
and standard deviation. 


(ii) This test is used for testing the hypothesis. 
(iii) This test possesses the additive property. 


(iv) This test can also be applied to a complex contingency table with several 
classes and as such is a very useful test in research work. 


(v) This test is an important non-parametric test as no rigid assumptions are 
necessary in regard to the type of population.. F. Yates has suggested a 
correction in x? value calculated in connection with a (2 x 2) table 
particularly when cell frequencies are small and x? is just on the 
significance level. The correction suggested by Yates is popularly known 
as Yates’ correction. It involves the reduction of the deviation of observed 


Special Distributions 


NOTES 


Self-Instructional 
Material 


187 


Special Distributions from expected frequencies, which of course reduces the value of %?. 
The rule for correction is to adjust the observed frequency in each cell of 
a (2 x 2) table in such a way as to reduce the deviation of the observed 
from the expected frequency for that cell by 0.5, and this adjustment is 
NOTES made in all the cells without disturbing the marginal totals. 


13. Itis an important property of y*. This means that several values of x? can be 
added together and if the degrees of freedom are also added, this number 
gives the degrees of freedom of total value of x?. Thus, if a number of %? 
values have been obtained from a number of samples of similar data, then 
because of the additive nature of %? we can combine the various values of %? 
by just simply adding them. 

14. The important characteristics of chi-square test are: 

(i) This test is based on frequencies and not on the parameters like mean 
and standard deviation. 
(ii) This test is used for testing the hypothesis. 
(iii) This test possesses the additive property. 
(iv) This test can also be applied to a complex contingency table with several 
classes and as such is a very useful test in research work. 
(v) This test is an important non-parametric test as no rigid assumptions are 
necessary in regard to the type of population. 

15. Anormal variate with mean zero and standard deviation unity, is called a 

standard normal variate. 


7.7 SUMMARY 


The Erlang distribution is a continuous probability distribution with wide 
applicability primarily due to its relation to the Exponential and Gamma 
distributions. 

The Erlang distribution, which measures the time between incoming calls, 
can be used in conjunction with the expected duration of incoming calls to 
produce information about the traffic load measured in Erlang units. 


The Erlang distribution is the distribution of the sum of k independent and 
identically distributed random variables each having an exponential 
distribution. 

e Chi-square test is a non-parametric test of statistical significance for bivariate 
tabular analysis (also known as cross-breaks). Any appropriate test of 
statistical significance lets you know the degree of confidence you can have 
in accepting or rejecting a hypothesis. 


The constraints must be linear. Constraints which involve linear equations in 
the cell frequencies of a contingency table (i.e., equations containing no 
squares or higher powers of the frequencies) are known as linear constraints. 
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x’ test enables us to see how well the distribution of observe data fits the 
assumed theoretical distribution such as Binomial distribution, Poisson 
distribution or the Normal distribution. 

Chi-square is also used to test the significance of population variance through 
confidence intervals, specially in case of small samples. 

x’ is used, at times, to test the significance of population variance (0) 
through confidence intervals. 

The curve is asymptotic to the base line which means that it continues to 
approach but never touches the horizontal axis. 

The normal distribution has only one mode since the curve has a single 
peak. In other words, it is always a unimodal distribution. 


Cae 
The distribution of a random variable Z = ae which is known as 


standard normal variate, is called the standard normal distribution or unit 
normal distribution, where X has a normal distribution with mean u end 
variance 0”. 

Under certain circumstance one can use normal distribution to approximate 
a binomial distribution as well as Poisson distribution. 


Normal distribution is continuous whereas both the distributions, Binomial 
as well as Poisson are discrete random variables. This fact has to be kept in 
mind while making use of normal distribution for approximating Binomial or 
Poisson distribution and use continuity correction. 


KEY WORDS 


Erlang B distribution: This is the easier of the two distributions and can 
be used in a call centre to calculate the number of trunks one need to carry 
a certain amount of phone traffic with a certain ‘target service’. 


Degrees of freedom: The number of independent constraints determines 
the number of degrees of freedom? (or df). If there are 10 frequency classes 
and there is one independent constraint, then there are (10— 1) =9 degrees 
of freedom. 


7.9 SELF-ASSESSMENT QUESTIONS AND 


EXERCISES 


Short-Answer Questions 


1. What is meant by Gamma function? 


2. Explain stochastic process. 


Special Distributions 


NOTES 


Self-Instructional 
Material 


189 


Special Distributions X Explain chi-square test. 
. Why is it considered an important test in statistical analysis? 


. Describe the term ‘Degrees of Freedom’. 


3 
4 
3 
NOTES 6. Define the necessary conditions required for the application of test? 
7. What are the areas of application of chi-square test? 
8. How will you find the value of chi-square? 
9. Define Yates’ correction formula for chi-square. 
10. Chi-square can be used as a test of population variance. Explain. 
11. Describe the additive properties of chi-square. 
12. Explain the important characteristics of chi-square test. 
13. Give the characteristics ofnormal distribution. 
14. Explain some examples ofnormal distribution. 


15. Define central limit theorem. 
Long-Answer Questions 


1. Briefly discuss about the Gamma distributions. 
2. What is Chi-square test? Explain its significance in statistical analysis. 
3. Write short notes on the following: 
(i) Additive property of chi-square 
(ii) Chi-square as a test of ‘goodness of fit’ 
(iii) Precautions in applying chi-square test 
(iv) Conditions for applying chi-square test 


4. Onofthe basis of information given below about the treatment of 200 patients 
suffering from a disease, state whether the new treatment is comparatively 
superior to the conventional treatment. 


Treatment No. of Patients 

Favourable Response No Response 
New 60 20 
Conventional 70 50 


For drawing your inference use the value of x? for one degree of freedom at 
the 5% level of significance, viz., 3.841. 
5. 200 digits were chosen at random from a set of tables. The frequencies of 
the digits were: 
Digit 0 1 2 3 4 5 6 7 8 
9 
Frequency 18 19 23 21 16 25 22 20 21 
15 


Self-Instructional Calculate x. 
190 Material 


6. The normal rate of infection for a certain disease in cattle is known to be 
50%. In an experiment with seven animals injected with a new vaccine it 
was found that none of the animals caught infection. Can the evidence be 
regarded as conclusive (at 1% level of significance) to prove the value of the 
new vaccine? 


7. Result of throwing dice were recorded as follows: 


Number Falling 
Upwards 1 2 3 4 5 6 
Frequency 27 33 31 29 30 24 


Is the dice unbiased? Answer on the basis of Chi-square test. 


8. (i) 1000 babies were born during a certain week in a city of which 600 
were boys and 400 girls. Use x? test to examine the correctness of the 
hypothesis that the sex ratio is 1:1 innewly born babies. 


(ii) The percentage of smokers in a certain city was 90. A random sample 
of 100 persons was selected in which 85 persons were found to be 
smokers. Is the sample proportion significantly different from the 
proportion of smokers in the city? Answer on the basis of chi-square 
test. 


9. How to measure the area under the normal curve? Give examples also. 


10. Discuss about the standard normal distribution. 
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8.0 INTRODUCTION 


One of the major objectives of the field of statistical analysis is to know the true or 
actual values of different parameters of population. The ideal situation would be to 
take the entire population into consideration in determining these values. However, 
that is not feasible due to cost, time, labour and other constraints. Accordingly, 
random samples ofa given size are taken from the population and these samples 
are properly analysed with the belief that the characteristics of these random samples 
represent similar characteristics of the population from which these samples have 
been taken. The results obtained from such analyses lead to generalizations that 
are considered to be valid for the entire population. 


You will study two broad types of sampling: probability samples and non- 
probability samples. Probability samples involve simple random sampling and 
restricted random sampling. Non-probability samples are characterized by non- 
random sampling. As simple random sampling is costly and time consuming, 
restricted random sampling is preferred. Sampling distribution of mean is a 
probability distribution of all possible sample means ofa given size selected from 
a population, while sampling distribution of proportions is a distribution of 
proportions of all possible random samples ofa fixed size. 


In this unit, you will study about the distribution of functions of random 
variable, sampling theory and transformation of variable of the discrete type. 


Distribution of Functions 


8.1 OBJECTIVES of Random Variable 


and Sampling 


After going through this unit, you will be able to: 
e Analyse the distribution of functions of random variable NOTES 
e Understand the various sampling theories 


e Explain the transformation of variable of the discrete type 


8.2 DISTRIBUTION OF FUNCTIONS OF RANDOM 
VARIABLE 


A random variable takes on different values as a result of the outcomes of arandom 
experiment. In other words, a function which assigns numerical values to each 
element of the set of events that may occur (i.e., every element in the sample 
space) is termed as random variable. The value of a random variable is the general 
outcome of the random experiment. One should always make a distinction between 
the random variable and the values that it can take on. All these can be illustrated 
by a few examples shown in Table 8.1. 


Table 8.1 Random Variable 


Random Variable Values of the Description of the Values of 
Random Variable the Random Variable 
X 0, 1,2,3,4 Possible number of heads 
in four tosses of a fair coin 
Y 1, 2,3, 4, 5, 6 Possible outcomes in a 
single throw of a die 
Z 2, 3,4, 5, 6, 7, 8, 9, 10, 11, 12 Possible outcomes from 
throwing a pair of dice 
M Only 2s She 8 ees S Possible sales of 


newspapers by a 
newspaper boy, 
S representing his stock 


All the stated random variable assignments cover every possible outcome 
and each numerical value represents a unique set of outcomes. A random variable 
can be either discrete or continuous. Ifa random variable is allowed to take on 
only a limited number of values, it is a discrete random variable, but if it is allowed 
to assume any value within a given range, it is a continuous random variable. 
Random variables presented in Table 8.1 are examples of discrete random 
variables. We can have continuous random variables if they can take on any value 
within a range of values, for example, within 2 and 5, in that case we write the 
values of arandom variable x as, 


2<xs5 
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Techniques of Assigning Probabilities 


We can assign probability values to the random variables. Since the assignment of 
probabilities is not an easy task, we should observe following rules in this context: 


(i) A probability cannot be less than zero or greater than one, 1.e.,0<pr<1, 
where pr represents probability. 


(ii) The sum ofall the probabilities assigned to each value of the random variable 
must be exactly one. 


There are three techniques of assignment of probabilities to the values of the random 
variable that are as follows: 


(i) Subjective Probability Assignment: It is the technique of assigning 
probabilities on the basis of personal judgement. Such assignment may 
differ from individual to individual and depends upon the expertise of the 
person assigning the probabilities. It cannot be termed as a rational way 
of assigning probabilities, but is used when the objective methods cannot 
be used for one reason or the other. 


(ii) A-Priori Probability Assignment: It is the technique under which the 
probability is assigned by calculating the ratio of the number of ways in 
which a given outcome can occur to the total number of possible outcomes. 
The basic underlying assumption in using this procedure is that every possible 
outcome is likely to occur equally. However, at times the use of this technique 
gives ridiculous conclusions. For example, we have to assign probability to 
the event that a person of age 35 will live upto age 36. There are two 
possible outcomes, he lives or he dies. If the probability assigned in 
accordance with a-priori probability assignment is half then the same may 
not represent reality. In such a situation, probability can be assigned by 
some other techniques. 


(iii) Empirical Probability Assignment: It is an objective method of assigning 
probabilities and is used by the decision-makers. Using this technique the 
probability is assigned by calculating the relative frequency of occurrence 
of a given event over an infinite number of occurrences. However, in practice 
only a finite (perhaps very large) number of cases are observed and relative 
frequency of the event is calculated. The probability assignment through this 
technique may as well be unrealistic, if future conditions do not happen to 
be a reflection of the past. 


Thus, what constitutes the ‘best’ method of probability assignment can only 
be judged in the light of what seems best to depict reality. It depends upon 
the nature of the problem and also on the circumstances under which the 
problem is being studied. 


Probability Distribution Functions: Discrete and Continuous 
When a random variable x takes discrete values x, x,,....,x, with probabilities 
Pp PoP, We have a discrete probability distribution of X. 


The function p(x) for which X =x, 
the probability function of X. 


The variable is discrete because it does not assume all values. Its properties 


X» X, takes values p, P,,-.-.P,» 18 


are: 
p(x) = Probability that X assumes the value x 
= Prob œx =x) =p; 
p(x) 20, Xp(x) = 1 


For example, four coins are tossed and the number of heads X noted. X can take 
values 0, 1, 2, 3, 4 heads. 


por=0)= (3) ie 


peeps" 


1y (1 6 
(x=2) 2" (+) ay. aoe. 
i aAA 16 
waa)= te (5 G 
i L2) (2) 16 
iv7ty 4 
X=4 =E — — =_— 
pl ) AE G 16 
6 
16 
5 
16 
4 
16 
3 
16 
2 
16 
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Distribution of Functions 1 4 6 4 1 
of Random Variable 

i i p(x)=—+—+—+—+— = 

Gnd Sampling 2, 16 16 16 16 16 

This is a discrete probability distribution. 


Example 1: Ifa discrete variable X has the following probability function, then 
find (7) a (ii) p(X < 3) (iii) p(X 3). 


Solution: The solution is obtained as follows: 


x, pæ) 


NOTES 


AA U N.e © 
N 
Q 


5 2a 
Since Lp(x) = 1,0 +a + 2a +2@ +4@ +2a=1 
a 6a’ + 5a — 1 =0, so that (6a — 1) (a+ 1)=0 


1 
a= — ra= —1 (Not admissible) 


1 5 
Fora = g, p(X 3)=0+a+ 2a + 2a*= 2a’ + 3a= 5 


4 
D(X 2 3) = 4a’? + 2a = 9 


(v) Continuous Probability Distributions 


When a random variate can take any value in the given interval a <x <b, itisa 
continuous variate and its distribution is a continuous probability distribution. 


Theoretical distributions are often continuous. They are useful in practice 


because they are convenient to handle mathematically. They can serve as good 
approximations to discrete distributions. 


The range of the variate may be finite or infinite. 


A continuous random variable can take all values in a given interval. A 
continuous probability distribution is represented by a smooth curve. 


The total area under the curve for a probability distribution is necessarily 
unity. The curve is always above the x axis because the area under the curve for 
any interval represents probability and probabilities cannot be negative. 

If Xis a continous variable, the probability of X falling in an interval with end 
points z,, z, may be written p(z, SX <z,). 
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Fig. 8.1 Continuous Probability Distribution 
A function is a probability density function if, 


D p(x)dx =1, p(x)>0,—% < x < œ, i.e., the area under the curve p(x) is 


1 and the probability of x lying between two values a, b, i.e., p(a < x < b) is 
positive. The most prominent example ofa continuous probability function is the 
normal distribution. 


Cumulative Probability Function (CPF) 


The Cumulative Probability Function (CPF) shows the probability that x takes a 
value less than or equal to, say, z and corresponds to the area under the curve up 
toz: 


p(x<z)= f pdx 
This is denoted by F(x). 
Extension to Bivariate Case: Elementary Concepts 


Ifin a bivariate distribution the data is quite large, then they may be summed up in 
the form of a two-way table. In this for each variable, the values are grouped into 
different classes (not necessary same for both the variables), keeping in view the 
same considerations as in the case of univariate distribution. In other words, a 
bivariate frequency distribution presents in a table pairs of values of two variables 
and their frequencies. 


For example, if there is m classes for the X— variable series and n classes 
for the Y— variable series then there will be m x n cells in the two-way table. By 
going through the different pairs of the values (x, y) and using tally marks, we can 
find the frequency for each cell and thus get the so called bivariate frequency table 
as shown in Table 8.2. 
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Table 8.2 Bivariate Frequency Table 


x Series Classes Lof 
- - Total o 
Mid Points Frequencies 
ofy 
e Xi Xo, Xi 
Vi 
V2 
y 
2 fy) f, 
AS 
ale 
Ol= 
Ya 
Total of . Total 
Frequencies fi = f= 
a L= LFN 


Here, f (x,y) is the frequency of the pair (x, y). The formula for computing 
the correlation coefficient between x and y for the bivariate frequency table is, 


Ni&xyf (x,y) — OR Èy) 
[PEPR - (Sxfx)? ] x [Ney 4, = (Sf) | 


Where, Nis the total frequency. 


Check Your Progress 


1. What is meant by subjective probability assignment? 
2. What is continuous probability distribution? 


8.3 SAMPLING THEORY 


A sample is a portion of the total population that is considered for study and 
analysis. For instance, if we want to study the income pattern of professors at City 
University of New York and there are 10,000 professors, then we may take a 
random sample of only 1,000 professors out of this entire population. Then this 
number of 1,000 professors constitutes a sample. The summary measure that 
describes a characteristic, such as average income of this sample is known as a 
statistic. 


Sampling is the process of selecting a sample from the population. Itis technically and 
economically not feasible to take the entire population for analysis. So we must take 


a representative sample out of this population for the purpose of such analysis. A 
sample is part of the whole, selected in such a manner as to be representing 
the whole. 


Random Sample 


It isa collection of items selected from the population in such a manner that each 
item in the population has exactly the same chance of being selected, so that the 
sample taken from the population would be truly representative of the population. 
The degree of randomness of selection would depend upon the process of selecting 
the items from the sample. A true random sample would be free from all biases 
whatsoever. For example, if we want to take a random sample of five students 
from a class of twenty-five students, then each one of these twenty-five students 
should have the same chance of being selected in the sample. One way to do this 
would be writing the names of all students on separate but small pieces of paper, 
folding each piece of this paper in a similar manner, putting each folded piece into 
a container, mixing them thoroughly and drawing out five pieces of paper from this 
container. 


Sampling without Replacement 


The sample as taken in the previous example is known as sampling without 
replacement, because each person can only be selected once so that once a piece 
of paper is taken out of the container, it is kept aside so that the person whose 
name appears on this piece of paper has no chance of being selected again. 


Sampling with Replacement 


There are certain situations in which the piece of paper once selected and taken 
into consideration is put back into the container in such a manner that the same 
person has the same chance of being selected again as any other person. For 
example, if we are randomly selecting five persons for award of prizes so that 
each person is eligible for any and all prizes, then once the slip of paper is drawn 
out of the container and the prize is awarded to the person whose name appears 
on the paper, the same piece of paper is put back into the container and the same 
person has the same chance of winning the second prize as anybody else. 


Sample Selection 


The third step in the primary data collection process is selecting an adequate 
sample. It is necessary to take a representative sample from the population, since 
it is extremely costly, time-consuming and cumbersome to do a complete census. 
Then, depending upon the conclusions drawn from the study of the characteristics 
of such a sample, we can draw inferences about the similar characteristics of the 
population. If the sample is truly representative of the population, then the 
characteristics of the sample can be considered to be the same as those of the 
entire population. For example, the taste of soup in the entire pot of soup can be 
determined by tasting one spoonful from the pot if the soup is well stirred. Similarly, 
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a small amount of blood sample taken from a patient can determine whether the 
patient’s sugar level is normal or not. This is so because the small sample of blood 
is truly representative of the entire blood supply in the body. 


Sampling is necessary because of the following reasons: First, as discussed earlier, 
it is not technically or economically feasible to take the entire population into 
consideration. Second, due to dynamic changes in business, industrial and social 
environment, it is necessary to make quick decisions based upon the analysis of 
information. Managers seldom have the time to collect and process data for the 
entire population. Thus, a sample is necessary to save time. The time element has 
further importance in that if the data collection takes a long time, then the values of 
some characteristics may change over the period of time so that data may no longer 
be up to date, thus defeating the very purpose of data analysis. Third, samples, if 
representative, may yield more accurate results than the total census. This is due to 
the fact that samples can be more accurately supervised and data can be more 
carefully selected. Additionally, because of the smaller size of the samples, the routine 
errors that are introduced in the sampling process can be kept at a minimum. Fourth, 
the quality of some products must be tested by destroying the products. For example, 
in testing cars for their ability to withstand accidents at various speeds, the 
environment of accidents must be simulated. Thus, a sample of cars must be selected 
and subjected to accidents by remote control. Naturally, the entire population of cars 
cannot be subjected to these accident tests and hence, a sample must be selected. 


One important aspect to be considered is the size of the sample. The sampling 
size—which is the number of sampling units selected from the population for 
investigation—must be optimum. If the sample size is too small, it may not 
appropriately represent the population or the universe as it is known, thus leading 
to incorrect inferences. Too large a sample would be costly in terms of time and 
money. The optimum sample size should fulfil the requirements of efficiency, 
representativeness, reliability and flexibility. What is an optimum sample size is 
also open to question. Some experts have suggested that 5 per cent of the 
population properly selected would constitute an adequate sample, while others 
have suggested as high as 10 per cent depending upon the size of the population 
under study. However, proper selection and representation of the sample is more 
important than size itself. The following considerations may be taken into account 
in deciding about the sample size: 


(a) The larger the size of the population, the larger should be the sample size. 


(b) Ifthe resources available do not put a heavy constraint on the sample size, 
a larger sample would be desirable. 


(c) Ifthe samples are selected by scientific methods, a larger sample size 
would ensure greater degree of accuracy in conclusions. 


(d) A smaller sample could adequately represent the population, if the 
population consists of mostly homogeneous units. A heterogeneous universe 
would require a larger sample. 


Census and Sampling 


Under the census or complete enumeration survey method, data is collected for 
each and every unit (for example, person, consumer, employee, household, 
organization) of the population or universe which are the complete set of entities 
and which are of interest in any particular situation. In spite of the benefits of such 
an all-inclusive approach, it is infeasible in most of the situations. Besides, the time 
and resource constraints of the researcher, infinite or huge population, the incidental 
destruction of the population unit during the evaluation process (as in the case of 
bullets, explosives etc) and cases of data obsolescence (by the time census ends) 
do not permit this mode of data collection. 


Sampling is simply a process of learning about the population on the basis 
ofa sample drawn from it. Thus, in any sampling technique, instead of every unit 
of the universe, only a part of the universe is studied and the conclusions are 
drawn on that basis for the entire population. The process of sampling involves 
selection of a sample based on a set of rules, collection of information and making 
an inference about the population. It should be clear to the researcher that a sample 
is studied not for its own sake, but the basic objective of its study is to draw 
inference about the population. In other words, sampling is a tool which helps us 
know the characteristics of the universe or the population by examining only a 
small part of it. The values obtained from the study of a sample, such as the 
average and dispersion are known as ‘statistics’ and the corresponding such values 
for the population are called ‘parameters’. 


Although diversity is a universal quality of mass data, every population has 
characteristic properties with limited variation. The following two laws of statistics 
are very important in this regard. 


1. The law of statistical regularity states that a moderately large number of 
items chosen at random from a large group are almost sure on the 
average to possess the characteristics of the large group. By random 
selection, we mean a selection where each and every item of the 
population has an equal chance of being selected. 


2. The law of inertia of large numbers states that, other things being equal, 
larger the size of the sample, more accurate the results are likely to be. 


Hence, a sound sampling procedure should result in a representative, 
adequate and homogeneous sample while ensuring that the selection of items should 
occur independently of one another. 


Sampling Techniques 


The various methods of sampling can be grouped under two broad categories: 
Probability (orrandom) sampling and Non-probability (or non-random) sampling. 


Probability sampling methods are those in which every item in the universe 
has a known chance, or probability of being chosen for the sample. Thus, the 
sample selection process is objective (independent of the person making the study) 
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and hence, random. It is worth noting that randomness is a property of the sampling 
procedure instead of an individual sample. As such, randomness can enter 
processed sampling in a number of ways and hence, random samples may be of 
many types. These methods include: (a) Simple Random Sampling, (b) Stratified 
Random Sampling, (c) Systematic Sampling and (d) Cluster Sampling. 


Non-probability sampling methods are those which do not provide every 
item in the universe with a known chance of being included in the sample. The 
selection process is, at least, partially subjective (dependent on the person making 
the study). The most important difference between random and non-random 
sampling is that whereas the pattern of sampling variability can be ascertained in 
case of random sampling, there is no way of knowing the pattern of variability in 
the non-random sampling process. The non-probability methods include: 
(a) Judgement Sampling, (b) Quota Sampling and (c) Convenience Sampling. 


The following Figure 8.2 depicts the broad classification and sub- 
classification of various methods of sampling. 


Sampling Methods 


| 


Non-probability Probability 
Samples Samples 
| | 
Judgement Quota Convenience 
Sampling Sampling Sampling 
| 
Simple Random Stratified Systematic Cluster 
Sampling Sampling Sampling Sampling 


Fig. 8.2 Methods of Sampling 


Non-Probability Sampling Methods 


(a) Judgement Sampling: In this method of sampling, the choice of sample 
items depends exclusively on the judgement of the investigator. The sample 
here is based on the opinion of the researcher, whose discretion will clinch 
the sample. Though the principles of sampling theory are not applicable to 
judgement sampling, it is sometimes found to be useful. When we want to 
study some unknown traits of a population, some of whose characteristics 
are known, we may then stratify the population according to these known 
properties and select sampling units from each stratum on the basis of 
judgement. Naturally, the success of this method depends upon the 
excellence in judgement. 


(b) Convenience Sampling: A convenience sample is obtained by selecting 
convenient population units. It is also called a chunk, which refers to that 


fraction of the population being investigated, which is selected neither by Distribution of Functions 
probability nor by judgement but by convenience. A sample obtained from ers 
readily available lists such as telephone directories is a convenience sample 

and not arandom sample, even ifthe sample is drawn at random from such 

lists. In spite of the biased nature of such a procedure, convenience sampling NOTES 


is often used for pilot studies. 

(c) Quota Sampling: Quota sampling is a type of judgement sampling and is 
perhaps the most commonly used sampling technique in non-probability 
category. In a quota sample, quotas (or minimum targets) are set up 
according to some specified characteristics, such as age, income group, 
religious or political affiliations, and so on. Within the quota, the selection of 
the sample items depends on personal judgement. Because of the risk of 
personal prejudice entering the sample selection process, the quota sampling 
is not widely used in practical works. 


It is worth noting that similarity between quota sampling and stratified 
random sampling is confined to dividing the population into different strata. 
The process of selection of items from each of these strata in the case of 
stratified random sampling is random, while it is not so in the case of quota 
sampling. Quota sampling is often used in public opinion studies. 


Probability Sampling Methods 


The following are the probability sampling methods: 
(a) Simple Random Sampling 
(b) Stratified Random Sampling 
(c) Systematic Sampling 
(d) Multistage or Cluster Sampling 
Simple Random Sampling 


It refers to that sampling technique in which each and every unit of the population 
has an equal chance of being selected in the sample. One should not mistake the 
term ‘arbitrary’ for ‘random’. To ensure randomness, one may adopt either the 
lottery method or consult the table of random numbers, preferably the latter. Being 
a random method, it is independent of personal bias creeping into the analysis 
besides enhancing the representativeness of the sample. Furthermore, it is easy to 
assess the accuracy of the sampling estimates because sampling errors follow the 
principles of chance. However, a completely catalogued universe is a prerequisite 
for this method. The sample size requirements would be usually larger under 
random sampling than under stratified random sampling, to ensure statistical 
reliability. It may escalate the cost of collecting data as the cases selected by 
random sampling tend to be too widely dispersed geographically. 
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Stratified Random Sampling 


In this method, the universe to be sampled is subdivided (stratified) into groups 
which are mutually exclusive, but collectively exhaustive based on a variable known 
to be correlated with the variable of interest. Then, a simple random sample is 
chosen independently from each group. This method differs from simple random 
sampling in that, in the latter the sample items are chosen at random from the entire 
universe. In stratified random sampling, the sampling is designed in such a way 
that a designated number of items is chosen from each stratum. If the ratio of 
items between various strata in the population matches with the ratio of 
corresponding items between various strata in the sample, it is called proportionate 
stratified sampling; otherwise, it is known as disproportionate stratified sampling. 
Ideally, we should assign greater representation to a stratum with a larger dispersion 
and smaller representation to one with small variation. Hence, it results in a more 
representative sample than simple random sampling. 


Systematic Sampling 


Itis also known as quasi-random sampling method because once the initial starting 
point is determined, the remainder of the items selected for the sample are 
predetermined by the sampling interval. A systematic sample is formed by selecting 
one unit at random and then selecting additional units at evenly spaced intervals 
until the sample has been formed. This method is popularly used in those cases 
where a complete list of the population from which sample is to be drawn is 
available. The list may be prepared in alphabetical, geographical, numerical or 
some other order. The items are serially numbered. The first item is selected at 
random generally by following the lottery method. Subsequent items are selected 
by taking every Kth item from the list where ‘K’ stands for the sampling interval or 
the sampling ratio, i.e., the ratio of the population size to the size of the sample. 


Symbolically, 


K = N/n , where K = Sampling Interval; N = Universe Size; n= Sample 
Size. In case K is a fractional value, it is rounded off to the nearest integer. 


Cluster Sampling 


Under this method, the random selection is made of primary, intermediate and 
final (or the ultimate) units from a given population or stratum. There are several 
stages in which the sampling process is carried out. At first, the stage units are 
sampled by some suitable method such as simple random sampling. Then, a sample 
of second stage units is selected from each of the selected first stage units, by 
applying some suitable method which may or may not be the same method 
employed for the first stage units. For example, ina survey of 10,000 households 
in AP, we may choose a few districts in the first stage, a few towns/villages/mandals 
in the second stage and select a number of households from each town/village/ 
mandal selected in the previous stage. This method is quite flexible and is 
particularly useful in surveys of underdeveloped areas, where no frame is generally 


sufficiently detailed and accurate for subdivision of the material into reasonably 
small sampling units. However, a multistage sample is, in general, less accurate 
than a sample containing the same number of final stage units which have been 
selected by some suitable single stage process. 


Sampling and Non-Sampling Errors 


The basic objective of a sample is to draw inferences about the population from 
which such sample is drawn. This means that sampling is a technique which helps 
us in understanding the parameters or the characteristics of the universe or the 
population by examining only a small part of it. Therefore, it is necessary that the 
sampling technique be a reliable one. The randomness of the sample is especially 
important because of the principle of statistical regularity, which states that a sample 
taken at random from a population is likely to possess almost the same 
characteristics as those of the population. However, in the total process of statistical 
analysis, some etrors are bound to be introduced. These errors may be the sampling 
errors or the non-sampling errors. The sampling errors arise due to drawing faulty 
inferences about the population based upon the results of the samples. In other 
words, it is the difference between the results that are obtained by the sample 
study and the results that would have been obtained if the entire population was 
taken for such a study, provided that the same methodology and manner was 
applied in studying both the sample as well as the population. For example, if a 
sample study indicates that 25 per cent of the adult population of a city does not 
smoke and the study of the entire adult population of the city indicates that 30 per 
cent are non-smokers, then this difference would be considered as the sampling 
error. This sampling error would be smallest if the sample size is large relative to 
the population, and vice versa. 


Non-sampling errors, on the other hand, are introduced due to technically 
faulty observations or during the processing of data. These errors could also arise 
due to defective methods of data collection and incomplete coverage of the 
population, because some units of the population are not available for study, 
inaccurate information provided by the participants in the sample and errors 
occurring during editing, tabulating and mathematical manipulation of data. These 
are the errors which can arise even when the entire population is taken under 
study. 


Both the sampling as well as the non-sampling errors must be reduced to a 
minimum in order to get as representative a sample of the population as possible. 
Theory of Estimation 


The theory of estimation is a very common and popular statistical method and is 
used to calculate the mathematical model for the data to be considered. This 
method was introduced by the statistician Sir R. A. Fisher, between 1912 and 
1922. This method can be used in: 


e Finding linear models and generalized linear models 
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and Sampling e Structural equation modelling 


e Calculating time-delay of arrival (TDOA) in acoustic or electromagnetic 
NOTES detection 

e Data modelling in nuclear and particle physics 

e Finding the result for hypothesis testing 


The method of estimation is used with known mean and variance. The sample 
mean becomes the maximum likelihood estimator of the population mean, and the 
sample variance becomes the close approximation to the maximum likelihood 
estimator of the population variance. 


Interval Estimate of the Population Mean (Population Variance Known) 


Since the sample means are normally distributed, with a mean of u and a standard 
deviation of o-, it follows that sample means follow normal distribution 


characteristics. Transforming the sampling distribution of sample means into the 
standard normal distribution, we get: 


Since ų falls within a range of values equidistant from y, 
u= xtZ o- 


This relationship is shown in the following illustration. 


X, X X, 
— u — 


This means that the population mean is expected to lie between the values of x, 


and x, which are both equidistant from x and this distance depends upon the 
value of Z which is a function of confidence level. 
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Suppose that we wanted to find out a confidence interval around the sample Distribution of Functions 
mean within which the population mean is expected to lie 95 per cent of the time. paar hate 
(We can never be sure that the population mean will lie in any given interval 100 
per cent of the time). This confidence interval is shown as follows: 

NOTES 


<— 95% ——> 


47.5% 47.5% 


2.5% 2.5% 


x, x x, 
— u — 


The points x, and x, above define the range of the confidence interval as follows: 
x, = x-Z o- 
and x, = x+Z o- 


Looking at the table of Z scores, (given in the Appendis) we find that the value of 
Z score for area 10.4750 (half of 95 per cent) is 1.96. This illustration can be 
interpreted as follows: 


a) Ifall possible samples of size n were taken, then on the average 95 per cent of 
these samples would include the population mean within the interval around 
their sample means bounded by x, and x,. 


b) Ifwe took a random sample of size n from a given population, the probability 
is 0.95 that the population mean would lie between the interval x, and x, 
around the sample mean, as shown. 


c) Ifarandom sample of size n was taken from a given population, we can be 95 
per cent confident in our assertion that the population mean will lie around the 
sample mean in the interval bounded by values of x, and x, as shown. (It is 
also known as 95 per cent confidence interval.) At 95 per cent confidence 
interval, the value of Z score as taken from the Z score table is 1.96. The 
value of Z score can be found for any given level of confidence, but generally 
speaking, a confidence level of 90%, 95% or 99% is taken into consideration 
for which the Z score values are 1.68, 1.96 and 2.58, respectively. 


Example 2: The sponsor ofa television programme targeted at the children’s 
market (age 4-10 years) wants to find out the average amount of time children 
spend watching television. A random sample of 100 children indicated the average 
time spent by these children watching television per week to be 27.2 hours. From 
previous experience, the population standard deviation of the weekly extent of 
television watched (s) is known to be 8 hours. A confidence level of 95 per cent is 
considered to be adequate. 
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x 272 X, 
u=8 
The confidence interval is given by: 


x+Zo- or x- Zo- <u<x+Zo- 

x x x 
h ro 
Where 9; == 
“Jn 


Accordingly, we need only four values, namely x, Z, © and x. In our case: 


x=272 
Z =1.96 
o =8 
n=100 
i gaT nn 
ence Oz TE 100 I0 ` 
Then: 
x =x-Zo- 
= 27.2-(1.96 x .8)=27.2 -1.568 
= 25.632 
And 
x, =x+ Zo- 
=27.2+ (1.96 x .8)=27.2 +1.568 
=28.768 


This means that we can conclude with 95 per cent confidence that a child on an 
average spends between 25.632 and 28.768 hours per week watching television. 
(It should be understood that 5 per cent ofthe time our conclusion would still be 
wrong. This means that because of the symmetry of distribution, we will be wrong 
2.5 per cent of the times because the children on an average would be watching 
television more than 28.768 hours and another 2.5 per cent of the time we will be 
wrong in our conclusion, because on an average, the children will be watching 
television less than 25.632 hours per week.) 
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Example 3: Calculate the confidence interval in the previous problem, if we want wa a 
. . O; anaom variable 
to increase our confidence level from 95% to 99%. Other values remain the same. and Sampling 


Solution: 


NOTES 
.495 .495 


.005 .005 


X, X=272 X, 
o=8 


If we increase our confidence level to 99 per cent, then it would be natural to 
assume that the range of the confidence interval would be wider, because we 
would want to include more values which may be greater than 28.768 or smaller 
than 25.632 within the confidence interval range. Accordingly, in this new situation, 


Z =2.58 
o-= 8 
Then 
X =x-Zo- 
=27.2 - (2.58 x .8) = 27.2 - 2.064 
=25.136 
And 
X, =x+Zo- 
= 27.2 + 2.064 
= 29.264 


(The value of Z is established from the table of Z scores against the area of .495 or 
a figure closest to it. The table shows that the area close to .495 is .4949 for which 
the Z score is 2.57 or .4951 for which the Z score is 2.58. In practice, the Z score 
of 2.58 is taken into consideration when calculating 99 per cent confidence interval.) 


Interval Estimate of the Population Mean (Population Variance Unknown) 


As the previous example shows, in order to determine the interval estimate of 4, 
the variance (and hence, the standard deviation) must be known, since it figures in 
the formula. However, the standard deviation of the population is generally not 
known. In such situations and when sample size is reasonably large (30 or more), 
we can approximate the population standard deviation (6) by the sample standard 
deviation (s), so that the confidence interval, 
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x+Z o- is approximated by the interval. 
xt Zs-, when n > 30. 
fe) s 
where o- =—= and s.=—=. 
© n Vn 


Example 4: It is desired to estimate the average age of students who graduate 

with an MBA degree in the university system. A random sample of 64 graduating 

students showed that the average age was 27 years with a standard deviation of 4 

years. 

a) Estimate a 95 per cent confidence interval estimate of the true average 
(population mean) age of all such graduating students at the university. 

b) How would the confidence interval limits change if the confidence level was 
increased from 95 per cent to 99 per cent. 

Solution: Since the sample size n is sufficiently large, we can approximate the 

population standard deviation by the sample standard deviation. 


(a) 


Now, 


r S 4 4 
* In 64 8 


95% confidence interval of population mean ų is given by: 


0.5 


xt Zs- 
So that, 
x% = x= Zs- 
=27 - (1.96 x .5) = 27 -0.98 
= 26.02 
And 
X, = x+ Zs- 
=27+0.98 
=27.98 
Hence, 26.02 > u > 27.98. 


(b) s-=.5 


X X=27 X, 


=) 


fw | 


Now, Z becomes 2.58 and the other values remain the same. Hence, 


x =x- Zs- 
=27 - (2.58 x .5)=27-1.29 


=25.71 
And 


X, = x+ Zs- 
=27 +1.29 
= 28.29 


Hence, 25.71< p< 28.29. 
Sample Size Determination for Estimating the Population Mean 


It is understood that the larger the sample size, the closer the sample statistic will 
be to the population parameter. Hence, the degree of accuracy we require in our 
estimate would be one factor influencing our choice of sample size. The second 
element that influences the choice of the sample size is the degree of confidence in 
ourselves that the error in the estimate remains within the degree of accuracy that 
is desired. Hence, the degree of accuracy has two aspects. 

1. Themaximum allowable error in our estimate 


2. The degree of confidence that the error in our estimate will not exceed the 
maximum allowable error 


The ideal situation would be that the sample mean x equals the population mean 
u. That would be the best estimate of u based on x. If the entire population was 
taken as a sample then x will be equal to u and there will be no error in our 
estimate. Hence, (x - 4) can be considered as error or deviation of the estimator 


x from the population mean u. This maximum allowable error must be pre- 
established. Let this error be denoted by Æ, so that: 


E=(x-p) 


Now, we know that 


Distribution of Functions 
of Random Variable 
and Sampling 


NOTES 


Self-Instructional 
Material 211 


Distribution of Functions 
of Random Variable 
and Sampling 


NOTES 


Self-Instructional 
212 Material 


z=7 
OF 
_ x-Ļ 
o/Nn 
= E 
o/Nn 
Or 
z Eva 
(ey 
Zo=ENn 
pe 
E 
Zo Ze? 
n=| — 
E E’ 


Based upon this formula, it can be seen that the size of the sample depends upon: 

(a) Confidence interval desired. This will determine the value of Z. For example, 
95 per cent confidence level yields the value of Z to be = 1.96. 

(b) Maximum error allowed (£) 


(c) The standard deviation of the population (©) 
It can further be seen from this formula that the sample size will increase if: 


(a) The allowable error becomes smaller 

(b) The degree of confidence increases 

(c) The value of the variance within the population is larger 

Example 5: We would like to know the average time that a child spends watching 
television over the weekend. We want our estimate to be within + 1 hour of the 
true population average. (This means that the maximum allowable error is 1 hour.) 
Previous studies have shown the population standard deviation to be 3 hours. 
What sample size should be taken for this purpose, if we want to be 95 per cent 
confident that the error in our estimate will not exceed the maximum allowable 
error? 


Solution: For 95 per cent confidence level, the values of 
Z= 1.96 
E= 1 hour (given) 
© =3 hours (given) 


_ 4.96) (3) 
(D? 
=34.57 


To be more accurate in our estimate, we always round off the answer to the next 
higher figure from the decimal. Hence, n= 35. 


Confidence Interval Estimation of Population Proportion 


So far we have discussed the estimation of population mean, which is quantitative 
in nature. This concept of estimation can be extended to qualitative data where the 
data is available in proportion or percentage form. In this situation the parameter 
of interest is 7t, which is the proportion of times a certain desirable outcome occurs. 
This concept lends itself to binomial distribution where we label the outcome of 
interest to us as success with the probability of success being m and the probability 


of failure being (1-7). 

When large samples of size n are selected from a population having a proportion 
of desirable outcomes 7, then the sampling distribution of proportions is normally 
distributed. For large samples, when (np) as well as (nq) are both at least equal to 
5, where n is the sample size, p is the probability of a desired outcome (or success) 
and q is the probability of failure (1-p), then the binomial distribution can also be 
approximated to normal distribution, with a mean of rt and a standard deviation of 


©, where ©, is given by: 


3, = [nad —T) 
n 


In such cases, we expect 95 per cent of all sample proportions to fall within the 
following range: 


n£1.96 0, 
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If all possible samples of size n are selected and the interval p+1.960,, is 


established for each sample, where p is the sample proportion, then 95 per cent of 
all such intervals are expected to contain 1, the population proportion. Then this 
range of, 


p+1.960, 
Or 
nl- r) 


n 


p+1.96 


Is known as the 95 per cent confidence interval estimate of 7. 


This formula requires that we know the value of r in order to calculate © . But, 
population proportion is generally not known. In such instances, the sample 
proportion is used as an approximation of 7. Hence, the 95 per cent confidence 
interval estimate of m becomes: 


prise, 


Where 
o, = = 


And 99 per cent confidence interval estimate of m becomes: 
pt258 0, 


Example 6: A survey of 500 persons shopping at a mall, selected at random, 
showed that 350 of them used credit cards for their purchases and 150 of them 
used cash. 


(a) Construct a 95 per cent confidence interval estimate of the proportion of all 
persons at the mall, who use credit card for shopping. 


(b) What would our confidence level be, if we make the assertion that the proportion 
of shoppers at the mall who shop with a credit card is between 67 per cent 
and 73 per cent. 


Solution: (a) There are 350 people out of a total sample of 500 who pay by 
credit card. Hence, the sample proportion of credit card shoppers 1s: 


p = 350/500 = 0.7. 


The 95 per cent confidence interval estimate of population proportion T is given 
as follows: 


ptl.9o, 


Where, 
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o - |PG=p) 
p= 
n 


! À . : . NOTES 
(Since T is not known, we approximate sample proportion p for population 


proportion T). 
Then, ©, = = =.00042 =.02 


Then the confidence limits are: 


P= p-1.96 0, 

=.7-1.96(.02) 

=.7 — .0392 = .6608 or 66.08%, and 
P, = pti.96c, 

=,.7+.0392=.7392 or 73.92%. 


P  p=27 p, 
6608 ©,=.02  .7392 


This means that the population of people who pay by credit card at the mall is 
between 66.8 per cent and 73.92 per cent. 


(b) Ifthe population proportion of credit card shoppers is given to be between 
.67 and .73, when such sample proportion p is .70, then 


.4332 .4332 


Pı p=.1 Pr 


67 73 
Pi =P Zo, 
67 =.7 -Z (0.2) 
.02Z =.7 — .67 
Z, ae Ser _:93 _1 5 
02 02 
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Similarly, 
.73=.7+Z (02) 
_.73-.7_.03_ 
> 02 02 


Using the Z score table, we see that the area under the curve for Z=1.5 is .4332. 
This area is on each side of the mean so that the total area is .8664. In other 
words, our confidence level is 86.64 per cent that the proportion of shoppers 
using credit card is between 67 per cent and 73 per cent. 


Sample Size Determination for Estimating the Population Proportion 


We follow the same procedure as we did in determining the sample size for estimating 
the population mean. As before, there are three factors that are taken into 
consideration. These are: 


(a) The level of confidence desired 
(b) The maximum allowable error permitted in the estimate 
(c) The estimated population proportion of success 7 


As established previously, 


ya 
O, 
_ |mU—7) 
Where 9 z 


P 
Now, (p-r) can be considered as error (E), so that: 
SE 
n(l-r) 
n 


F- 


By cross-multiplication we get, 


E-Z m(1—7) 
n 


Squaring both sides we get, 
_ Z?n(1-2) 

n 
or nE’? =Z’°n(l- r) 


E? 


This formula assumes that we know n, the population proportion, which we are a F Po 
trying to estimate in the first place. Accordingly, m is unknown. However, if any í PA NDE 
previous studies have estimated this value or a sample proportion p has been 
calculated in previous studies, then we can approximate this p for m and hence, 
NOTES 
_Z° p-p) 
p 


However, if no previous surveys have been taken so that we do not know the 
value ofr or p, then we assume T to be equal to 0.5, simply because, other things 
being given, the value of 1 being 0.5 will result in a larger sample size than any 
other value assumed by r. Hence, the sample size would be at least as large or 
larger than required for the given conditions. This can be established by the fact 
that when T = 0.5, then m(1 - T) is .5 x .5 =0.25. This value is larger than any 
other value ofr (1 - r). This means that when mt = 0.5 then for a given value of Z, 
n would be larger than any other value of 7. This results in a more conservative 
estimate which is desirable. 


Example 7: It is desired to estimate the proportion of children watching television 

on Saturday mornings, in order to develop a promotional strategy for electronic 

games. We want to be 95 per cent confident that our estimate will be within + 2 

per cent of the true population proportion. 

(a) What sample size should we take if a previous survey showed that 40 per 
cent of children watched television on Saturday mornings? 

(b) What would be the sample size, for the same degree of confidence and the 
same maximum allowable error, ifno such previous survey had been taken? 


Solution: (a) In this case, the following values are given: 
Z= 1.96 (95% confidence interval) 
p=04 
E = 0.02 


Substituting these values in the following formula, we get: 


_ Zp(l-p) 
= es 
(1.96)? (.4)(.6) 
(02) 

0.922 
0.0004 
2304.96 

= 2305 

For the sake of accuracy, we always round off to the next higher figure, in case of 
answer being a fraction. 


II 
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(b) In this case, since no previous surveys have been taken, we assume p = 0.5 
and follow the earlier procedure. 


naz Pan p) 
_ (1.96)?(.5)(.5) 
(.02)° 
_ 0.9604 
0.0004 
= 2401 


8.4 TRANSFORMATION OF VARIABLE OF THE 
DISCRETE TYPE 


A random variable is defined as a function from Q to R, which constantly takes the 
numerical values. Here, Q is referred as the set of possible outcomes of a probability 
experiment, consequently a random variable as a function can be written as 
X: Q >R, which specifies the method of assigning a numerical value to each 
respective outcome of the probability experiment. Generally, the outcome of a 
probability experiment can be identified or known when the experiment is carried 
out, therefore Xor any another variable name is used for representing this outcome 
of the experiment before actually knowing the exact value. 


The following examples explain the concept of random variables. When a 
coin is flipped, then the possibility is that either there is the Head (H) or the Tail 
(T), however you can define the random variable associated/related with the flipping 
of coin as X, by means of 


X (H) =0 

X(T)=1 
Or, 

Y(H)=1 

Y(T)=-l 


The numerical values depend on the problem statistics, for example, when 
the coin is flipped in a chance game then the outcome as head gives Rupee 5 while 
the tail gives Rupee 8. The outcomes of the probability experiment, as heads and 
tails, can be represented by {—8, 5} using the profit-loss concept. This can be 
represented using W for the random variable signifying the winnings, by means of, 


W: {H, T}3{5,-8} 


A discrete random variable is the variable that takes values ina finite or ale oa F o 
. . b è O; anaom variable 
countable infinite subset of R. The value of the random variable for the specific and Sampling 


range can be denoted by Z. 


Example 8: A coin is flipped three times. Find out the appearance of heads in 


TE 
three flips and the range of random variable. Nee 


Solution: Let X be the random variable which represents the number of times 
heads can appear in the three flips. 


Then, 

X:0>{0, 1, 2,3} CR 
The range of X is given by, 

T= {0, 1, 2, 3} 


Example 9: Two dice are rolled simultaneously. What will be the random variable 
and the range for the maximum roll? 


Solution: Consider that the random variable is assumed or specified for the 
maximum roll when the two dice are rolled simultaneously. Assume that the 
maximum rolls are 6. 


Here, 
X= Value on the First Die 
Y= Outcome of the Second Die 
Z= max (X, Y) is the Random Variable to consider. 
The range of Z is given by, 
T= {1, 2, 3, 4, 5, 6} 
The concept of the discrete random variable can be defined using: 
e The probability mass function. 
e The expected value of a random variable. 
e The variance ofa random variable. 


1. Fora discrete random variable, the probability mass function can be find 
using the equation, 


f() =P(X=h forke 1 


2. The expected value (expectation, mean) ofa random variable can be defined 
using the probability mass function, 


w-E()- > kP(X = k) 
kel 


Self-Instructional 
Material 219 


Distribution of Functions 
of Random Variable 
and Sampling 


220 


NOTES 


Self-Instructional 
Material 


In this case, each outcome is subjective/weighted by its respective 
probability. The expected value function is specifically written using the notation 
E[.] or E(.), and for the random variable we can use the notation u (mu) for mean. 


3. The variance can be defined as, 
o? = var (X) = E [(X- py] 
This definition defines the expected value of the square of the difference of 


X from the mean u. The var (.) function can be applied to any random variable, 
and the notation o° defines which random variable is being evaluated. 


Definition. Two discrete random variables X and Y are independent if for all x in 
the range of X and all y in the range of Y, we have 

P(X=x, Y=y) =P (X=x) P(Y=y) 

Here P denotes the probability. If X and Y are not independent, then this 


may not be true for all combinations of x and y and hence there will be some event 
(X=x, Y=y) for which this equation is false. 


Example 10: A coin is flipped only once. Find the number of heads and tails. 
Specify that the discrete random variables is independent random variable. 


Solution: When the coin is flipped only once, then the probability of getting heads 
and tails on the same flip is zero, i.e., the events are mutually exclusive. Let X be 
the number of heads and Y be the number of tails. Then, 


P(X=1)P(Y=1)=0.25 


It can be stated that the joint probability mass function of independent random 
variables Xand Y factors into the probability mass functions for Xand Y. 


As per the definition of expected value the following fact can be proved for 
the discrete random variables: 


IfX and Y are independent random variables, then 
E XVE VEV 
Provided that both E (X) and E (Y) exist and are finite. 


Check Your Progress 


. Why is sampling without replacement so called? 

. What do you mean by sampling with replacement? 

. What is the main difference between the above two types of sampling? 
. What are the two types of samples? 


. What is meant by a simple random sample? 


mann nn A W 


. Define the term discrete random variable. 


Distribution of Functions 


8.5 ANSWERS TO CHECK YOUR PROGRESS of Random Variable 


QUESTIONS and Sampling 
1. It is the technique of assigning probabilities on the basis of personal NOTES 


judgement. Such assignment may differ from individual to individual and 
depends upon the expertise of the person assigning the probabilities. 

2. When a random variate can take any value in the given intervala <x <b, 
it is a continuous variate and its distribution is a continuous probability 
distribution. 


3. Sampling without replacement is so called because in this process, each 
person or element can be selected only once and not replaced for another 
selection. 


4. Sampling with replacement means the process where a specific person or 
element can be selected once and can also be replaced for another selection. 


5. Unlike in the case of sampling without replacement, the same element can 
get reselected in sampling with replacement. 


6. The two types of samples are: 
(a) Probability Samples 
(b) Non-Probability Samples 


7. A simple random sample is the one in which each and every unit of the 
population has an equal chance of being selected into the sample. 


8. A discrete random variable is the variable that takes values in a finite or 
countable infinite subset of R. The value of the random variable for the 
specific range can be denoted by I. 


8.6 SUMMARY 


e Arandom variable takes on different values as a result of the outcomes of 
arandom experiment. 


A probability cannot be less than zero or greater than one, i.e., 0 < pr < 
1, where pr represents probability. 


The sum of all the probabilities assigned to each value of the random 
variable must be exactly one. 

e Acontinuous random variable can take all values in a given interval. A 
continuous probability distribution is represented by a smooth curve. 
The Cumulative Probability Function (CPF) shows the probability that x 
takes a value less than or equal to, say, z and corresponds to the area 
under the curve up to z: 


pa <z)=| plx)dr 
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A sample is a portion of the total population that is considered for study and 
analysis. 

Sampling is the process of selecting a sample from the population. It is 
technically and economically not feasible to take the entire population for 
analysis. 

There are certain situations in which the piece of paper once selected and 
taken into consideration is put back into the container in such a manner that 
the same person has the same chance of being selected again as any other 
person. 

The third step in the primary data collection process is selecting an adequate 
sample. It is necessary to take a representative sample from the population, 
since it is extremely costly, time-consuming and cumbersome to do a 
complete census. 

If the resources available do not put a heavy constraint on the sample size, 
a larger sample would be desirable. 


A smaller sample could adequately represent the population, if the population 
consists of mostly homogeneous units. A heterogeneous universe would 
require a larger sample. 

Sampling is simply a process of learning about the population on the basis 
of a sample drawn from it. Thus, in any sampling technique, instead of 
every unit of the universe, only a part of the universe is studied and the 
conclusions are drawn on that basis for the entire population. 


Probability sampling methods are those in which every item in the universe 
has a known chance, or probability of being chosen for the sample. 


Non-probability sampling methods are those which do not provide every 
item in the universe with a known chance of being included in the sample. 
The selection process is, at least, partially subjective (dependent on the 
person making the study). 

Quota sampling is a type of judgement sampling and is perhaps the most 
commonly used sampling technique in non-probability category. 


The basic objective of a sample is to draw inferences about the population 
from which such sample is drawn. This means that sampling is a technique 
which helps us in understanding the parameters or the characteristics of the 
universe or the population by examining only a small part of it. 

It is understood that the larger the sample size, the closer the sample statistic 
will be to the population parameter. Hence, the degree of accuracy we 
require in our estimate would be one factor influencing our choice of sample 
size. 


8.7 


KEY WORDS 


e Sample: A portion of the total population that is considered for study and 


analysis. 


e Sample size: It is not possible to consider the entire population to conduct 


8.8 


any study or to do any statistical analysis. Hence, the random representative 
samples are taken for the purpose of analysis. The sample size of 30 or 
more is considered as a large sample size while below 30 is considered as 
asmall sample size. 


SELF-ASSESSMENT QUESTIONS AND 
EXERCISES 


Short-Answer Questions 


Nn FB WN 


. What is empirical probability assignment? 

. What is meant by cumulative probability function? 
. Explain briefly the different types of sampling. 

. What is sampling distribution? Give examples. 

. Define population in statistical terms. 

. Explain briefly the terms sample and sampling. 

T; 


What is standard error? 


Long-Answer Questions 


1. Discuss the techniques of assigning probabilities. 


. Describe the continuous probability distributions. 


3. Give concrete examples of sampling with replacement and sampling without 


replacement. 


. Adealer of Toyota cars sold 20,000 Toyota Camry cars last year. He is 


interested to know if his customers are satisfied with their purchases. 3000 
questionnaires were mailed at random to the purchasers. 1600 responses 
were received. 1440 of these responses indicated satisfaction. 


(a) What is the population of interest? 
(b) What is the sample? 


(c) Is the percentage of satisfied customer a parameter or a statistic? 


. Differentiate between probability samples and non-probability samples. 


Under what circumstances would non-probability types of samples be more 
useful in statistical analyses. 
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6. How does sampling with replacement differ from sampling without 
replacement? Give some examples of situations where sampling has to be 
done without replacement. 

7. Explain in detail the situations that would require: 

(a) Judgement sampling 
(b) Quota sampling 
(c) Stratified sampling 

8. Differentiate between sampling errors and non-sampling errors. Under what 
circumstances would each type of error occur? What steps can be taken to 
minimize the impact of such errors upon statistical analyses? 

9. Your college has a total population of 5000 students. It is desired to estimate 
the proportion of students who use drugs. 

(a) What type of sampling would be necessary to reach a meaningful 
conclusion regarding the drug use habits of all students? 


(b) What type of sampling would you select so that the sample is most 
representative of the population? 

(c) Drug use being a sensitive issue, what type of questions would you 
include in your questionnaire? What type of questions would you avoid? 
Give reasons. 


10. The lottery method of sample selection is still the most often used method. 


Discuss this method in detail and give reasons as to why a sample selected 
by the lottery method would be representative of the population. 


11. You are the chairman of the Department of Business Administration and 


you have been asked to make a report on the current status of students 
who graduated with B.S. in Business during the two years of 1989 and 
1990. Records have indicated that a total of 425 students graduated from 
the department during these two years. The report is to include information 
regarding sex of the student, grade point average at the time of graduation, 
whether the students completed MBA degree or started a job after B.S. 
degree, current employment position and current annual salary. Prepare a 
proposal for this survey and include in this proposal: 


(a) Objectives of the survey. 
(b) Type of sampling technique. 
(c) Size ofthe sample. 


12. At New Delhi airport, there is a green channel and a red channel. Passengers 


without any custom duty articles can go through the green channel. Some 
passengers are stopped for a random check. What type of random sampling 
would be appropriate in such situations? Would judgement sampling be 
more appropriate? Give reasons. 
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9.0 INTRODUCTION 


A random variable has a probability distribution, which specifies the probability of 
Borel subsets ofits range. Random variables can be discrete, that is, taking any of 
a specified finite or countable list of values (having a countable range), endowed 
with a probability mass function characteristic of the random variable’s probability 
distribution; or continuous, taking any numerical value in an interval or collection 
of intervals (having an uncountable range), via a probability density function that is 
characteristic of the random variable’s probability distribution; or a mixture of 
both types. Two random variables with the same probability distribution can still 
differ in terms of their associations with, or independence from, other random 
variables. The realizations of a random variable, that is, the results of randomly 
choosing values according to the variable’s probability distribution function, are 
called random variates. 

In this unit, you will study about the transformation of variable of the 
continuous type, beta distribution, ¢ distribution, F Distribution and extensions of 
the change of variable techniques. 


Transformation of 


9.1 OBJECTIVES Variable of the 


Continuous Type 


After going through this unit, you will be able to: 
e Understand the transformation of variable of the continuous type NOTES 
e Discuss about the beta distribution 


e Briefly describe the ¢ distribution and F distribution 


9.2 TRANSFORMATION OF VARIABLE OF THE 
CONTINUOUS TYPE 


A continuous variable is one which can take on an uncountable set of values. A 
variable over a non-empty range of the real numbers is continuous, if it can take 
on any value in that range. Methods of calculus are often used in problems in 
which the variables are continuous, for example in continuous optimization problems. 
In statistical theory, the probability distributions of continuous variables can be 
expressed in terms of probability density functions. In continuous-time dynamics, 
the variable time is treated as continuous, and the equation describing the evolution 
of some variable over time is a differential equation. The instantaneous rate of 
change is a well-defined concept. 


9.2.1 The Beta Distribution 


In probability theory and statistics, the Beta distribution is a family of continuous 
probability distributions defined on the interval [0, 1] parameterized by two positive 
shape parameters, denoted by o and ß that appear as exponents of the random 
variable and control the shape of the distribution. 


The beta distribution has been applied to model the behaviour of random 
variables limited to intervals of finite length in a wide variety of disciplines. For 
example, it has been used as a statistical description of allele frequencies in 
population genetics, time allocation in project management or control systems, 
sunshine data, variability of soil properties, proportions of the minerals in rocks in 
stratigraphy and heterogeneity in the probability of HIV transmission. 


In Bayesian inference, the Beta distribution is the conjugate prior probability 
distribution for the Bernoulli, Binomial and Geometric distributions. For example, 
the Beta distribution can be used in Bayesian analysis to describe initial knowledge 
concerning probability of success, such as the probability that a space vehicle will 
successfully complete a specified mission. The Beta distribution is a suitable model 
for the random behavior of percentages and proportions. 

The usual formulation of the Beta distribution is also known as the Beta 
distribution of the first kind, whereas Beta distribution of the second kind is an 
alternative name for the Beta prime distribution. 
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The probability density function of the Beta distribution, for 0 <x< 1, and 
shape parameters 01, B > 0, is a power function of the variable x and ofits reflection 
(1—x) like follows: 


f(x;a,B) = constant . x“ (1 -— x)?" 
x“ (1 - x)?" 
fiu a-u) du 


T@+) a 
Pa)l(B) 
o 1 
B(a,B) 
Where, T(z) is the Gamma function. The Beta function, B, appears to be 
normalization constant to ensure that the total probability integrates to 1. In the 
above equations x is a realization — an observed value that actually occurred — 
ofa random process X. 


(- xF = 


x= xP 


The Cumulative Distribution Function (CDF) is given below: 
B(x; a, 
COP) ee By 
B(a,B) 
Where, B (x; œ, B) is the incomplete beta function and I, (a, B) is the 
regularized incomplete beta function. 


F(x;0,B) = 


The mode ofa beta distributed random variable X with a, B> 1 is given by 
the following expression: 


a-l 
a+ßB-2 
When both parameters are less than one (œ, B< 1), this is the anti-mode - 
the lowest point of the probability density curve. 


The median of the beta distribution is the unique real number * = 71 ''(2.,B) 
2 


for which the regularized incomplete beta function I (a, B) = 1/2, there are no 
general closed form expression for the median of the beta distribution for arbitrary 
values of o and B. Closed form expressions for particular values of the parameters 
aandb follow: 


e For symmetric cases «= B, median = 1/2. 


e For a= 1 and B>0, median = = se (this case is the mirror-image of the 


power function [0, 1] distribution). 


e For a>0 and B= 1, median = Ja (this case is the power function [0, 1] 


distribution). 


e For 7=3 and B = 2, median = 0.6142724318676105..., the real solution 
to the quartic equation 1—8x*+6x4 = 0, which lies in [0, 1]. 
e For a = 2 and B = 3, median = 0.385727568 13238945... = 1-median 
(Beta (3, 2)). 
The following are the limits with one parameter finite (non zero) and the 
other approaching these limits: 
lim median _ lim median = 1, 


B—>0 ~ Q&—>00 
lim median _ lim median = 0. 
a0 ~ Boo 


A reasonable approximation of the value of the median of the Beta 
distribution, for both and B greater or equal to one, is given by the following 
formula: 


Median = ——3~ for o,f > 1. 


When o, B = 1, the relative error (the absolute error divided by the median) 
in this approximation is less than 4% and for both œ 22 and B = 2 it is less than 
1%. The absolute error divided by the difference between the mean and the mode 
is similarly small. 


The expected value (mean) (u) of a beta distribution random variable X 
with two parameters « and B is a function of only the ratio B/a of these parameters: 


u = ELX] = J af @sa.,B)dx 


= fx a 
B(a,B) 
ais 
a+p 


Letting o = B in the above expression one obtains u = 1/2, showing that for 
a= the mean is at the center of the distribution: it is symmetric. Also, the following 
limits can be obtained from the above expression: 
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Therefore, for B/a — 0, or for a/B — ©, the mean is located at the right 
end, x = 1. For these limit ratios, the Beta distribution becomes a one-point 
degenerate distribution with a Dirac Delta function spike at the right end, x= 1, 
with probability 1 and zero probability everywhere else. There is 100% probability 
(absolute certainty) concentrated at the right end, x= 1. 


Similarly, for B/a& — ©, or for o/B — 0, the mean is located at the left end, 
x= 0. The Beta distribution becomes a 1 point Degenerate distribution with a 
Dirac Delta function spike at the left end, x = 0, with probability 1 and zero 
probability everywhere else. There is 100% probability (absolute certainty) 
concentrated at the left end, x = 0. Following are the limits with one parameter 
finite (non zero) and the other approaching these limits: 
lim p_ lim p=1 
B—0 ~ &@—>0O0 
lim pu _ lim p=0 
a>0 — Boo 
While for typical unimodal distributions with centrally located modes, inflexion 
points at both sides of the mode and longer tails with Beta (œ, B) such that 
œ, B> 2 it is known that the sample mean as an estimate of location is not as 
robust as the sample median, the opposite is the case for uniform or ‘U-shaped’ 
Bimodal distributions with beta(a, B) such that (a, B < 1), with the modes located 
at the ends of the distribution. 


The logarithm of the Geometric Mean (G,) ofa distribution with random 
variable Xis the arithmetic mean of In (X), or equivalently its expected value: 


lIn G =£ [ln X] 


For a Beta distribution, the expected value integral gives: 


Elin X] = fi Inxf(x;0.,B)dx 


a-lyy p-1 
= f Inx d= dx 


B(a,B) 
o 1 (es -x7 dx 
B(a,B) °° olen 


Transformation of 
1 O t al a _ x) dé Variable of the 


~ B(a,B) a 0 Continuous Type 


= 1 @B(a,B) 
~ Bia,B) ôa 


NOTES 


_ OlnB(a,B) 

j 0a 

_ OlnT(a) oOlnI (a+) 
6a 0a 
= '¥(a)—'¥(a +B) 


Where, y is the Digamma Function. 


Therefore, the geometric mean ofa Beta distribution with shape parameters 
œ and B is the exponential of the Digamma functions of and B as follows: 


Gy = eX] = eOe) 


While for a Beta distribution with equal shape parameters «= B, it follows 
that Skewness = 0 and Mode = Mean = Median = 1/2, the geometric mean is less 
than 1/2: 0 < G< 1/2. The reason for this is that the logarithmic transformation 
strongly weights the values of X close to zero, as In (X) strongly tends towards 
negative infinity as X approaches zero, while In (X) flattens towards zero as 
X21. 


Along a line «= 8, the following limits apply: 


lim = 
a=p0 Gg=0 
lim 1 


a=fp>0 ~x T7 


Following are the limits with one parameter finite (non zero) and the other 
approaching these limits: 


lim lim 
B—>0 Gy = a>% Gy =1 


nine a ae Gy =0 

The accompanying plot shows the difference between the mean and the 
geometric mean for shape parameters & and B from zero to 2. Besides the fact 
that the difference between them approaches zero as œ and B approach infinity 
and that the difference becomes large for values of & and B approaching zero, one 
can observe an evident asymmetry of the geometric mean with respect to the 
shape parameters œ and B. The difference between the geometric mean and the 
mean is larger for small values of in relation to B than when exchanging the 
magnitudes of B and a. 


Self-Instructional 
Material 231 


Transformation of 
Variable of the 
Continuous Type 


NOTES 


Self-Instructional 
232 Material 


The inverse of the Harmonic Mean (H) ofa distribution with random variable 
Xis the arithmetic mean of 1/X, or, equivalently, its expected value. Therefore, the 
Harmonic Mean (H) ofa beta distribution with shape parameters œ and is: 


- f LOD ae 
0 x 


1 
pe ax 
°  xB(a.,B) 


SE Senet and ps0 
a+ß-1 


The Harmonic Mean (H) ofa beta distribution with œ < 1 is undefined, 
because its defining expression is not bounded in [0, 1] for shape parameter a less 
than unity. 

Letting «= ß in the above expression one can obtain the following: 

a-l 


“Dae: 


X 


Showing that for œ= 8 the harmonic mean ranges from 0, for a= B = 1, to 
1/2, for a = B > œ. 


Following are the limits with one parameter finite (non zero) and the other 
approaching these limits: 


lim 25 
a0 Fx = undefined 


lim _ lim =, 
a Hx = po Ax =0 


ieee ea H 

The Harmonic mean plays a role in maximum likelihood estimation for the 
four parameter case, in addition to the geometric mean. Actually, when performing 
maximum likelihood estimation for the four parameter case, besides the harmonic 
mean H „based on the random variable X, also another harmonic mean appears 
naturally: the harmonic mean based on the linear transformation (1'"X), the mirror 


image of X, denoted by H, y; 


1 B- 
Ha-x = = 


= if B>L&a>0. 
Jed a+ß-1 
(-xX) 


The Harmonic mean (H/_, ,.) ofa Beta distribution with B < 1 is undefined, Transformation of 
3 . _ GX) ; Variable of the 
because its defining expression is not bounded in [0, 1] for shape parameter B Continuous Type 


less than unity. 
Using & = B in the above expression one can obtain the following: 
B-1 
Hip 
0O 28-1 


This shows that for œ = B the harmonic mean ranges from 0, for &=B=1, 
to 1/2, for æ =B > æ. 


NOTES 


Following are the limits with one parameter finite (non zero) and the other 
approaching these limits: 


lim H a-x) = undefined 


B—>0 
li li 
Bol Ha-x) = ae Ha_x =0 
lim lim 
a0 Ho-wo = B00 Hax =l 
Although both H, and #7, , are asymmetric, in the case that both shape 


parameters are equal o = f, the harmonic means are equal: H, = H i This 
equality follows from the following symmetry displayed between both harmonic 


means: 
H = (B(a,p)) = Hax (BB, «)) ifa, p > 1. 
9.2.2 t Distribution 


Sir William S. Gosset (pen name Student) developed a significance test and through 
it made significant contribution in the theory of sampling applicable in case of small 
samples. When population variance is not known, the test is commonly known as 
Student’s ¢-test and is based on the ¢ distribution. 


Like the normal distribution, ¢ distribution is also symmetrical but happens 
to be flatter than the normal distribution. Moreover, there is a different ¢ distribution 
for every possible sample size. As the sample size gets larger, the shape of the t 
distribution loses its flatness and becomes approximately equal to the normal 
distribution. In fact, for sample sizes of more than 30, the ¢ distribution is so close 
to the normal distribution that we will use the normal to approximate the t 
distribution. Thus, when n is small, the ¢ distribution is far from normal, but when 
nis infinite, it is identical with normal distribution. 


For applying t-test in context of small samples, the ¢ value is calculated first 
of all and, then the calculated value is compared with the table value oft at certain 
level of significance for given degrees of freedom. If the calculated value of t 
exceeds the table value (say ¢, ,.), we infer that the difference is significant at 5% 
level, but ifcalculated value is £, is less than its concerning table value, the difference 


is not treated as significant. 
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(ii) The population standard deviation (6,) must be unknown. 
NOTES In using the t-test, we assume the following: 
(i) That the population is normal or approximately normal 


(ii) That the observations are independent and the samples are randomly drawn 
samples; 


(iii) That there is no measurement error; 


(iv) That in the case of two samples, population variances are regarded as equal 
if equality of the two population means is to be tested 


The following formulae are commonly used to calculate the ¢ value: 
(i) To Test the Significance of the Mean of a Random Sample 
-1% -ul 
S | SE, X 


Where, ¥ = Mean ofthe sample 
u = Mean ofthe universe 


SE, = S.E. ofmeanin case of small sample and is worked out as follows: 


X(x, -xy 
g vn 
SE. = — = 
` ~n Vn 


and the degrees of freedom = (n — 1) 


The above stated formula for t can as well be stated as under: 


If we want to work out the probable or fiducial limits of population mean (u) in 
case of small samples, we can use either of the following: 
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(a) Probable limits with 95% confidence level: Transformation of 
Variable of the 
u= X+SE-_ (tos) Continuous Type 


(b) Probable limits with 99% confidence level: 
b= X+ SE, (tooi) 


NOTES 


Atother confidence levels, the limits can be worked out in a similar manner, taking 
the concerning table value oft just as we have taken ¢, ,,in (a) and ¢, ,, in (b) 
above. 


(ii) To Test the Difference between the Means of the Two Samples 


t= | x, = X, | 
SE. 5, 
Where, X, = Mean ofthe sample 1 
X, =Mean ofthe sample 2 


SE. y, = Standard Error of difference between two sample means and 
is worked out as follows: 


SE. x, = (pas —x,y +X), -7 


n+n-=2 


Lee Al 
x a 
n Ny, 
and the degrees of freedom = (n, +n,—2). 


When the actual means are in fraction, then use of assumed means is convenient. 
In such a case, the standard deviation of difference, i.e., 


io + x) + X(x; — z) 


n+n, -2 


Can be worked out by the following short-cut formula: 


_ VEO = A) +E, A) n (Xi; A,y Ny (Xz; A,y’ 


n +n, -2 


Where, A, = Assumed mean of sample 1 


A, = Assumed mean of sample 2 


be 
Il 


True mean ofsample 1 


< 
Il 


True mean of sample 2 
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(iii) To Test the Significance of an Observed Correlation Coefficient 


á : xVvn-2 


l-r 


t= 


Here, t is based on (n — 2) degrees of freedom. 
(iv) In Context of the ‘Difference Test’. 


Difference test is applied in the case of paired data and in this context t is 
calculated as under: 


= X pit -0 = X Diff -0 Jn 


t 
SoypVn Sip 


Where, Xpy or D = Mean ofthe differences of sample items. 


0 = thevalue zero on the hypothesis that there is no difference 


O..... = standard deviation of difference and is worked out as 


Diff. 
2D = Xo) 
(n—I) 


or 


xD? -(D)’n 
(n-1) 


D = differences 


n = number of pairs in two samples and is based on (n —1) 
degrees of freedom. 


The following examples would illustrate the application of t-test using the above 
stated formulae. 


Example 1: A sample of 10 measurements of the diameter of a sphere, gave a 
mean X= 4.38 inches and a standard deviation, © = 0.06 inches. Find (a) 95% 
and (b) 99% confidence limits for the actual diameter. 


Solution: On the basis of the given data the standard error of mean 


_ o, _ 0.06 0.06 _ 9 99 
vVn-1 ~10-1 3 l 


Assuming the sample mean 4.38 inches to be the population mean, the required 
limits are as follows: 


(i)95% confidence limits = X + SE; (t,o) With degrees of freedom 
4.38 + .02(2.262) 


II 


= 4.38 + .04524 Transformation of 
Variable of the 
i.e., 4.335 to 4.425 Continuous Type 
(ii) 99% confidence limits = X + SE; (fo) With 9 degrees of freedom 
NOTES 
= 4.38 + .02(3.25) = 4.38 + .0650 
i.e., 4.3150 to 4.4450. 


Example 2: The specimen of copper wires drawn from a large lot have the 
following breaking strength (in kg. wt.): 


578, 572, 570, 568, 572, 578, 570, 572, 596, 544 


Test whether the mean breaking strength of the lot may be taken to be 578 kg. wt. 
Solution: We take the hypothesis that there is no difference between the mean 
height of the sample and the given height of universe. In other words we can write, 


H 


0 


:u=X, H,:u#X . Then on the basis of the sample data, the mean and 


standard deviation has been worked out as under: 


S. No. X (X-X) (X,- XY 

1 578 6 36 

2 572 0 0 

3 570 =) 4 

4 568 4 16 

5 572 0 0 

6 578 6 36 

7 570 =) 4 

8 572 0 0 

9 596 24 576 

10 544 28 784 

n=10 LX =5720 E(X- X )?= 1456 
g- _ 5720 
n 10 
=572 
X(x- x, y 
E n-1 
i = i = 
“Vio-1 V9- 
=12.72 


Self-Instructional 
Material 237 


Transformation of 
Variable of the 
Continuous Type 


NOTES 


Self-Instructional 
238 Material 


o, 12.72 
SE, =-=- 
-22403 
3.16 
iea _|572-578| 
SE, 4.03 
= 1.488 


Degrees of freedom =n —1 =9 


At5% level of significance for 9 degrees of freedom, the table value of t= 2.262. 
For a two-tailed test. 


The calculated value of tis less than its table value and hence the difference is 
insignificant. The mean breaking strength of the lot may be taken to be 578 Kg. 
wt. with 95% confidence level. 


Example 3: Sample of sales in similar shops in two towns are taken for a new 
product with the following results: 


Mean sales Variance ‘Size of sample 
Town A 57 53 5 
Town B 6l 4.8 7 


Is there any evidence of difference in sales in the two towns? 


Solution: We take the hypothesis that there is no difference between the two 
sample means concerning sales in the two towns. In other words, 


H,:X,=X,, H,: X, + X, . Then, we work out the concerning ¢ value as follows: 


t= X, -X, | 
SEs 
Where, x, = Mean ofthe sample concerning Town A 
x, = Meanofthe sample concerning Town B 
SE. = Standard Error of the difference between two means. 
SE. _ = Exu- ny +2, -ny Le t,t 
n n +n —2 n n, 
Hence, 
_|S7-61|_ 4 
1.421 1.421 
=2.82 


Degrees of freedom = (n, + n,— 2) =(5 + 7—2)= 10 koo 
Table value of tat 5% level of significance for 10 degrees of freedom is 2.228, for Continuous Type 


a two-tailed test. 


The calculated value oft is greater than its table value. Hence, the hypothesis is NOTES 
wrong and the difference is significant. 


Example 4: The sales data of an item in six shops before and after a special 
promotional campaign are: 


Shops A B C D E F 
Before the 

promotional 

campaign 53 28 31 48 50 42 
After the campaign 58 29 30 55 56 45 


Can the campaign be judged to be a success? Test at 5% level of significance. 


Solution: We take the hypothesis that the campaign does not bring any improvement 
in sales. We can thus write: 


In order to judge this, we apply the “difference test’. For this purpose we calculate 
the mean and standard deviation of differences in two sample items as follows: 


Shops Sales before Sales after Difference = D (D- D) (D- DP 
campaign campaign (i.e., increase or 
Xp Xi decrease after the 
campaign) 

A 53 58 +5 +1.5 2.25 
B 28 29 +] -2.5 6.25 
C 31 30 -1 4.5 20.25 
D 48 55 +7 43.5 12.25 
E 50 56 +6 +25 6.25 
F 42 45 +3 -0.5 0.25 
n=6 YD=21 y(D - DP 

=47.50 


. > xD 21 
Mean of difference or X pig = ora 3.5 


Standard deviation of difference 


X(D-D)* [47.50 
Spy = |A = | — =3.08 
n-1 6-1 
X 0 

=vn 
O pip 


X 
= 1.14 x 2.45 = 2.793 


Pcie ial 
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Degrees of freedom = (n— 1) =(6—1)=5 


Table value oft at 5% level of significance for 5 degrees of freedom = 2.015 for 
one-tailed test. 


Since, the calculated value oft is greater than its table value, the difference is 
significant. Thus, the hypothesis is wrong and the special promotional campaign 
can be taken as a success. 


Example 5: Memory capacity of 9 students was tested before and after training. 
From the following scores, state whether the training was effective or not. 


Student 1 2 3 4 5 ó 7 8 9 
Before (X,,) 0 5 9 3 7 6 7 4 
After (X,) 2 v 8 5 6 u 8 20 3 


Solution: We take the hypothesis that training was not effective. We can write, 
H,:X%,=X,, H,:X >X, . We apply the difference test for which purpose first of 
all we calculate the mean and standard deviation of difference as follows: 


Students Before X,,, After X; Difference = D D 
1 10 12 2 4 
2 15 17 2 4 
3 9 8 -1 1 
4 3 5 2 4 
5 7 6 -1 1 
6 12 11 -1 1 
T 16 18 2 4 
8 17 20 9 
9 4 3 -1 1 

n=9 ÈD=7 XD? =29 
De 075 
n 9 
ED’ - (DY 29 — (0.78% 
Oniy = ©) "| a) E 
É n-l 9-1 
t= BAEN =1.369 
1-71 


Degrees of freedom = (n—1)=(9-1)=8 
Table value of t at 5% level of significance for 8 degrees of freedom 
= 1.860 for one-tailed test. 


Since the calculated value oft is less than its table value, the difference is insignificant 
and the hypothesis is true. Hence it can be inferred that the training was not effective. 


Example 6: It was found that the coefficient of correlation between two variables Ti oe A 

. . . arıabte o e 
calculated from a sample of 25 items was 0.37. Test the significance ofr at 5% Continuous Type 
level with the help of t-test. 


Solution: To test the significance ofr through t-test, we use the following formula NOTES 
for calculating t value: 
t= xyn-2 
l-r 
0.37 
xy25-2 
1- (0.377 
=1.903 


Degrees of freedom = (n—2) = (25-2) =23 


The table value of & at 5% level of significance for 23 degrees of freedom is 
2.069 for a two-tailed test. 


The calculated value of tis less than its table value, hence r is insignificant. 


Example 7: A group of seven week old chickens reared on high protein diet 
weigh 12, 15, 11, 16, 14, 14 and 16 ounces; a second group of five chickens 
similarly treated except that they receive a low protein diet weigh 8, 10, 14, 10 
and 13 ounces. Test at 5% level whether there is significant evidence that additional 
protein has increased the weight of chickens. (Use assumed mean (or 4,) = 10 for 
the sample of 7 and assumed mean (or A,) = 8 for the sample of 5 chickens in 
your calculation). 


Solution: We take the hypothesis that additional protein has not increased the 
weight of the chickens. We can write, H :X >X, H% >X 


Applying t-test, we work out the value of t for measuring the significance of two 
sample means as follows: 


= Xi -X, 
SE 


xX, — Xy 


t 


Calculation of can be done as under: 


x, (X74) X74, x, (X,,—A,) (X,- 4," 
A,=10 A=8 
12 2 4 8 0 0 
15 5 25 10 2 4 
11 1 1 14 6 36 
16 6 36 10 2 4 
14 4 16 13 5 25 
14 4 16 
16 6 36 
n=7 LY(X,,-A D XX -4y n=5 X(X,,-,) Y(X,-4,/ 
=28 =134 =15 =69 
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Variable of the 
Continuous Type K =10 + 28 -14 

T 

NOTES D(x; — Ay) 
Similaty,  %2=4+—" 
2 
J ag 
5 


Hence, 


SE = U(X; — Ay + XX, A,y n (X, A iy n, (X, = Ay x 1 +4 l 
w n +n, -2 n n, 


PER O, TI 
52 V7 5 


= (2.14) (.59) = 1.2626 


We now calculate the value under ¢ 


aoa 14-11 
SEy x, 1.2626 


= 2.376 


Degree of freedom = (n, + n,—2) = (7+ 5—2) = 10 


The table value oft at 5% level of significance for 10 degrees of freedom = 1.812 
for one-tailed test. 


The calculated value oft is higher than its table value and hence the difference is 
significant, which means the hypothesis is wrong. It can therefore be concluded 
that additional protein has increased the weight of chickens. 


9.2.3 F Distribution 


In business decisions, we are often involved in determining if there are significant 
differences among various sample means, from which conclusions can be drawn 
about the differences among various population means. In the previous chapters, 
we discussed and evaluated the differences between two sample means. But, 
what if we have to compare more than 2 sample means? For example, we may be 
interested to find out if there are any significant differences in the average sales 
figures of 4 different salesman employed by the same company, or we may be 
interested to find out if the average monthly expenditures of a family of 4 in 5 
different localities are similar or not, or the telephone company may be interested 
in checking, whether there are any significant differences in the average number of 
requests for information received in a given day among the 5 areas of New York 
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City, and so on. The methodology used for such types of determinations is known Ti eoo of 
š å ariable of the 
as Analysis of Variance. Continuous Type 


This technique is one of the most powerful techniques in statistical analysis 


and was developed by R.A. Fisher. It is also called the F-Test. NOTES 


There are two types of classifications involved in the analysis of variance. 
The one-way analysis of variance refers to the situations when only one fact or 
variable is considered. For example, in testing for differences in sales for three 
salesman, we are considering only one factor, which is the salesman’s selling ability. 
In the second type of classification, the response variable of interest may be affected 
by more than one factor. For example, the sales may be affected not only by the 
salesman’s selling ability, but also by the price charged or the extent of advertising 
in a given area. 


For the sake of simplicity and necessity, our discussion will be limited to 
One-way Analysis of Variance. 


The null hypothesis, that we are going to test, is based upon the assumption 
that there is no significant difference among the means of different populations. 
For example, if we are testing for differences in the means of k populations, then, 


H, = Wy = Py = p; Serer =W 


The alternate hypothesis (H) will state that at least two means are different 
from each other. In order to accept the null hypothesis, all means must be equal. 
Even if one mean is not equal to the others, then we cannot accept the null 
hypothesis. The simultaneous comparison of several population means is called 
Analysis of Variance or ANOVA. 


Assumptions 


The methodology of ANOVA is based on the following assumptions. 


(Each sample of size n is drawn randomly and each sample is independent of 
the other samples. 


(ii) The populations are normally distributed. 


(iii) The populations from which the samples are drawn have equal variances. 
This means that: 


626, 6, = gas: =o;, fork populations. 
The Rationale Behind Analysis of Variance 


Why do we call it the Analysis of Variance, even though we are testing for means? 
Why not simply call it the Analysis of Means? How do we test for means by 
analysing the variances? As a matter of fact, in order to determine if the means of 
several populations are equal, we do consider the measure of variance, 6”. 
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The estimate of population variance, 6”, is computed by two different 
estimates of 6”, each one by a different method. One approach is to compute an 
estimator of 67in such a manner that even if the population means are not equal, it 
will have no effect on the value of this estimator. This means that, the differences in 
the values of the population means do not alter the value of 6’ as calculated by a 
given method. This estimator of 0” is the average of the variances found within 
each of the samples. For example, if we take 10 samples of size n, then each 
sample will have a mean and a variance. Then, the mean of these 10 variances 
would be considered as an unbiased estimator of 6”, the population variance, and 
its value remains appropriate irrespective of whether the population means are 
equal or not. This is really done by pooling all the sample variances to estimate a 
common population variance, which is the average of all sample variances. This 
common variance is known as variance within samples or O° ini 


The second approach to calculate the estimate of 6’, is based upon the 
Central Limit Theorem and is valid only under the null hypothesis assumption that 
all the population means are equal. This means that in fact, if there are no differences 
among the population means, then the computed value of o° by the second 
approach should not differ significantly from the computed value of o by the first 
approach. 


Hence, If these two values of 0” are approximately the same, then we can 
decide to accept the null hypothesis. 


The second approach results in the following computation. 


Based upon the Central Limit Theorem, we have previously found that the 
standard error of the sample means is calculated by: 


2 fey 
Oo. = 


ae 


or, the variance would be: 


2 
or, oO =n0; 


Thus, by knowing the square of the standard error of the mean (o.)’, we 


could multiply it by n and obtain a precise estimate of 0°. This approach of estimating 
©? is known as O° keneen Now, if the null hypothesis is true, that is ifall population 
means are equal then, 


© eween Value should be approximately the same as O° y value. A significant 


difference between these two values would lead us to conclude that this difference 
is the result of differences between the population means. 


But, how do we know that any difference between these two values is 
significant or not? How do we know whether this difference, if any, is simply due 
to random sampling error or due to actual differences among the population means? 


R.A. Fisher developed a Fisher test or F-test to answer the above question. Transformation of 
Variable of the 


He determined that the difference between O° swen and O°... values could be 
expressed as a ratio to be designated as the F-value, so that: 


2 
F _ O between 


Gii 
In the above case, ifthe population means are exactly the same, then o° 


will be equal to the O° n» and the value of F will be equal to 1. 


between 


However, because of sampling errors and other variations, some disparity 
between these two values will be there, even when the null hypothesis is true, 
meaning that all population means are equal. The extent of disparity between the 
two variances and consequently, the value of F, will influence our decision on 
whether to accept or reject the null hypothesis. It is logical to conclude that, if the 
population means are not equal, then their sample means will also vary greatly 
from one another, resulting in a larger value of O° siwee, and hence a larger value of 
F (O°, inn 18 based only on sample variances and not on sample means and hence, 
is not affected by differences in sample means). Accordingly, the larger the value 
of F, the more likely the decision to reject the null hypothesis. But, how large the 
value of F be so as to reject the null hypothesis? The answer is that the computed 
value of F must be larger than the critical value of F, given in the table for a given 
level of significance and calculated number of degrees of freedom. (The F 
distribution is a family of curves, so that there are different curves for different 
degrees of freedom). 


Degrees of Freedom 


We have talked about the F-distribution being a family of curves, each curve 
reflecting the degrees of freedom relative to both O” siwee and O’ ini This means 
that, the degrees of freedom are associated both with the numerator as well as 
with the denominator of the F-ratio. 


(i) The Numerator: Since the variance between samples, O° etween comes 
from many samples and if there are k number of samples, then the degrees 
of freedom, associated with the numerator would be (k-1). 


(ii) The Denominator: It is the mean variance of the variances of k samples 
and since, each variance in each sample is associated with the size of the 
sample (n), then the degrees of freedom associated with each sample would 
be (n— 1). Hence, the total degrees of freedom would be the sum of degrees 
of freedom of k samples or 


df=k(n—1), when each sample is of size n. 
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The F-Distribution 


The major characteristics of the F-distribution are as follows: 


(i) Unlike normal distribution, which is only one type of curve irrespective of 
the value of the mean and the standard deviation, the F distribution is a 
family of curves. A particular curve is determined by two parameters. These 
are the degrees of freedom in the numerator and the degrees of freedom in 
the denominator. The shape of the curve changes as the number of degrees 
of freedom changes. 


(ii) Itis acontinuous distribution and the value of F cannot be negative. 
(iii) The curve representing the F distribution is positively skewed. 
(iv) The values of F theoretically range from zero to infinity. 


A diagram of F distribution curve is shown below. 


0 F 
The rejection region is only in the right end tail of the curve because unlike Z 
distribution and ¢ distribution which had negative values for areas below the mean, 
F distribution has only positive values by definition and only positive values of F 
that are larger than the critical values of F, will lead to a decision to reject the null 
hypothesis. 


Computation of F 


Since F ratio contains only two elements, which are the variance between the 
samples and the variance within the samples, the concepts of which have been 
discussed before, let us recapitulate the calculation of these variances. 


If all the means of samples were exactly equal and all samples were exactly 
representative of their respective populations so that all the sample means, were 
exactly equal to each other and to the population mean, then there will be no 
variance. However, this can never be the case. We always have variation, both 
between samples and within samples, even if we take these samples randomly and 
from the same population. This variation is known as the total variation. 


The total variation designated by $` (X - XY, where X represents individual 


observations for all samples and Y is the grand mean of all sample means and 


equals (u), the population mean, is also known as the total sum of squares or Transformation of 


ae Variable of th 
SST, and is simply the sum of squared differences between each observation and BAA 
the overall mean. This total variation represents the contribution oftwo elements. 
These elements are: 
NOTES 


(A) Variance between Samples: The variance between samples may be due to 
the effect of different treatments, meaning that the population means may be 
affected by the factor under consideration, thus, making the population means 
actually different, and some variance may be due to the inter-sample variability. 
This variance is also known as the sum of squares between samples. Let this sum 
of squares be designated as SSB. 

Then, SSB is calculated by the following steps: 

(i) Take k samples of size n each and calculate the mean of each sample, i.e., 


ben ce S 


. ke 


(ii) Calculate the grand mean X of the distribution of these sample means, so that, 


(iii) Take the difference between the means of the various samples and the grand 
mean, i.e., 


(GeO) X), COA ocak) 


(iv) Square these deviations or differences individually, multiply each of these 
squared deviations by its respective sample size and sum up all these products, so 
that we get; 


k = = 
Yin (X,- XY, where n,= size of the ith sample. 


i=l 
This will be the value of the SSB. 


However, if the individual observations ofall samples are not available, and only 
the various means of these samples are available, where the samples are either of 
the same size n or different sizes, n, n,, ny, ......1,, then the value of SSB can be 
calculated as: 


SSB =n; (X, -XY $n (X, -XP + un (X, -XF 
where, 
n, = Number ofitems in sample 1 
n, = Number ofitems in sample 2 
n, = Number ofitems in sample k 
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X, = Mean of sample 1 
X, = Mean of sample 2 
X, = Mean of sample k 
X% = Grand mean or average of all items in all samples. 


(v) Divide SSB by the degrees of freedom, which are (A — 1), where k is the 


number of samples and this would give us the value of O° sweep SO that, 


2 SSB 


O between =, (21). 
(This is also known as mean square between samples or MSB). 


(B) Variance within Samples: Even though each observation in a given sample 
comes from the same population and is subjected to the same treatment, some 
chance variation can still occur. This variance may be due to sampling errors or 
other natural causes. This variance or sum of squares is calculated through the 
following steps: 


(i) Calculate the mean value ofeach sample, i.e., X,,X,,X,, .... X,- 


(ii) Take one sample at a time and take the deviation of each item in the sample 
from its mean. Do this for all the samples, so that we would have a difference 
between each value in each sample and their respective means for all values in all 
samples. 


(ii) Square these differences and take a total sum of all these squared differences 
(or deviations). This sum is also known as SSW or sum of squares within samples. 


(iv) Divide this SSW by the corresponding degrees of freedom. The degrees of 
freedom are obtained by subtracting the total number of samples from the total 
number of items. Thus, if N is the total number of items or observations, and k is 
the number of samples, then, 


df= (N — k) 
These are the degrees of freedom within samples. (Ifall samples are of equal size 


n, then df = k(n —1), since (n — 1) are the degrees of freedom for each sample 
and there are k samples). 


(v) This figure SSW/df, is also known as 6” 
within samples). 


or MSW (mean of sum of squares 


within? 
Now, the value of F can be computed as: 


F= Oetween = SSB/ df 
Oain  SSW/df 

_ SSBk - 1) MSB 

SSW/(N — k) MSW 


This value of F is then compared with the critical value of F from the table and a 
decision is made about the validity of null hypothesis. 


ANOVA Table 


After various calculations for SSB, SSW and the degrees of freedom have been 
made, these figures can be presented in a simple table called Analysis of Variance 
table or simply ANOVA table, as follows: 


ANOVA Table 


Source of Variation Sum of Squares Degrees of freedom Mean Square F 


SSB MSB 
* MSB = eee 
Treatment SSB (k-1) k-10) MSW 
_ _ SSW 
Within SSW N-k) (n-k 
Total SST 
Then, 
F- MSB 
MSW 


A Short-Cut Method 


The formula developed above for the computation of the values of F-statistic is 
rather complex and time consuming when we have to calculate the variance between 
samples and the variance within samples. However, a short-cut, simpler method 
for these sum of squares is available, which considerably reduces the computational 
work. This technique is used through the following steps: 


(i) Take the sum ofall the observations ofall the samples, either by adding all the 
individual values, or by multiplying the mean ofeach sample by its size and then 
adding up all these products as follows: 


The Total Sum TS =n,X,+n,X,+...n,X,, for k samples 


(ii) Calculate the value ofa correction factor. The correction factor (CF) value is 
obtained by squaring the total sum obtained above and dividing it by the total 
number of observations N, so that: 


(TSY 
— N 


CF 


(iii) The total sum of squares is obtained by squaring all individual observations of 
all samples, summing up these values and subtracting from this sum, the correction 
factor (CF). 


In other words: 


TSY 
Total sum of squares SST =X? +X; +... +X; - m 
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Where, 


£X? = Summation of squares for all X’s in sample 1. 


EX; = Summation of squares for all X’s in sample 2. 


LX, = Summation of squares for all X’s in sample k. 


(iv) The sum of squares between the samples (SSB) is obtained by the following 
formula: 


Ges) pace pels 
n n, Nn, N 


k 


SSB = 


Where, 
(XX,)° = Square ofthe total of all values in sample 1. 


(XX, )° = Square of the total of all values in sample 2. 
(£X,) = Square of the total ofall values in sample k. 


(v) Then sum of squares within samples SSW can be calculated as: 
SSW = Total sum of squares minus the sum of squares between samples 
= SST — SSB 
(vi) The rest of the procedure is similar to the previous method. 


Example 8: To test whether all professors teach the same material in different 
sections of the introductory statistics class or not, four sections of the same course 
were selected and a common test was administered to five students selected at 
random from each section. The scores for each student from each section were 
noted and are given below. We want to test for any differences in learning, as 
reflected in the average scores for each section. 


Section 1 Section 2 Section 3 Section 4 
Student # Scores (X,) Scores (X,) Scores (X,) Scores (X) 
1 8 12 10 12 
2 10 12 13 15 
3 12 10 11 13 
4 10 8 12 10 
5 5 13 14 10 
Totals xX, =45 2X, =55 xX, = 60 xX, = 60 
Means Xi =9 X.=11 X3=12 X4=12 


Solution: A. The Traditional Method 
(7) State the null hypothesis. We are assuming that there is no significant difference 
among the average scores of students from these four sections and hence, all 
professors are teaching the same material with the same effectiveness, i.e., 

AL)? Wy = Wy = Hs = My 

H : All means are not equal or at least two means differ from each other 
(ii) Establish a level of significance. Let a=0.05. 
(iii) Calculate the variance between the samples, as follows: 


(a) The mean of each sample is: 


Xı=9, X2=11, X; =12,X4=12 
(b) The grand mean or X is: 


Zx 9411412412 
n 4 
all 
(c) Calculate the value of SSB: 


y= 


SSB =n(X - XY 
=5 (9-11) +5 (11-11) +5 (12-11)? +5 (12-11? 
=20+04+5+5 
= 30 


(d) The variance between samples ©? tween ot MSB is given by: 


SSB (30) (80) 
df ha). 3 


MSB 10 


(iv) Calculate the variance within samples, as follows: 


To find the sum of squares within samples (SSW), we square each deviation 
between the individual value ofeach sample and its mean, for all samples and then 
sum these squared deviations, as follows: 


Sample 1: X, =9 
E(X,- Xı} =(8-9) + (10-9)? + (12-9)? + (10-9)? + (5-9) 


=14+14+9+41+416 
= 28 


Sample 2: X =11 
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E(X, —X2)? =(12-11)? +(12-11)? + (10-11)? + (8-11)? +13 -11° 
=14+14+14+9+4 
=16 
Sample3: Y,=12 


E(X, - Xs)? S10 $12)? +319)" +(11-12} +(12-12} 44 027 
=44+14+1+0+4 
=10 
Sample4: ¥,=12 


E(X, - X4} =(12-12)? + (15-12)? + (13-12)? + (10-12)? + (10 - 12)” 
=04+94+14+4+4 
=18 
Then, SSW = 28+ 16+ 10 +18 
= 72 


Now, the variance within samples, 6? or MSW is given by: 


within’ 


SSW SSW Nn N 


MSW =—_ = = =—=4.5 
df (N-k) 20-4 16 
MSB 10 
-ratio = ——— = — =2.22. 
Then, the F-ratio MSW 45 


Now, we check for the critical value of F from the table for œ= 0.05 and degrees 
of freedom as follows: 


df (numerator) = (k — 1) = (4-1) =3 

df (denominator) = (N — k) = (20 — 4) = 16 
This value of F from the table is given as 3.24. Now, since our calculated value of 
F = 2.22 is less than the critical value of F = 3.24, we cannot reject the null 
hypothesis. 
B. The Short-Cut Method 
Following the procedure outlined before for using the short-cut method, we get: 
(i) Total sum (TS) = =X 

= 220 


(TS) _ (220) 
20 


(ii) Correction before CF = = 2420 


(iii) Total sum of squares: 
SST =X(X’) — CF 
=2522 — 2420 -102 


(iv) Sum of squares betwen the samples SSB is obtained by: 


k 2 
SSB = 5 4) _ op 


isl Mi 


2 2 2 
a A Mais +) _ op 
n n, N, 

2 2 2 2 
_ (459°, (55)? _, (60)" _ (60) 
5 5 5 
= 405 + 605 + 720 + 720 — 2420 
= 30 


(2420) 


(v) SSW can be calculated by: 


SST — SSB = 102 — 30 = 72 


Now the F value can be calculated as: 


_ SSB/df _ 30/K(k-1) _ 30/3 _ 10 
SSW/df 72Kn-k) 72/16 45 
= 999 


AS we see, we get the same value of F as obtained by the traditional method. So, 
we compare our value of F with the critical value of F from the table for a= 0.05 
and df (numerator = 3), and df (denominator = 16), and we get the critical value 


of F as 3.24. As before, we accept the null hypothesis. 


The ANOVA Table 


We can construct an ANOVA table for the problem solved above as follows: 


ANOVA Table 
Source of Variation Sum of Squares Degrees offreedom Mean Square F 
T SSB =30 k-1)=3 -o 
reatment = (k-1)= MSW 
230 äg z 
3 4.5 
Within (or error) SSW=72 (N—k)=16 MSW =2.22 
72 


Total SST= 102 
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Variable of the 
Continuous Type Check Your Progress 
. Give the value of the median of the beta distribution. 
NOTES 


. Who developed t-test? When it is used? 
. On what assumptions the ANOVA methodology is based? 


. What are the major characteristics of distribution? 


BW N -e 


9.3 ANSWERS TO CHECK YOUR PROGRESS 
QUESTIONS 


1. A reasonable approximation of the value of the median of the Beta 
distribution, for both œ and B greater or equal to one, is given by the following 
formula: 


Median = ——3— for o,ß >1. 


2. Sir William S. Gosset (pen name Student) developed a significance test 
and through it made significant contribution in the theory of sampling 
applicable in case of small samples. When population variance is not known, 
the test is commonly known as Student’s t-test and is based on the 
t distribution. 

3. The methodology of ANOVA is based on the following assumptions. 


(i) Each sample of size n is drawn randomly and each sample is independent 
of the other samples. 


(ii) The populations are normally distributed. 


(iii) The populations from which the samples are drawn have equal variances. 
This means that: 


OF = OF = 63 = cen =o;, fork populations. 
4. The major characteristics of the F-distribution are as follows: 
(7) Unlike normal distribution, which is only one type of curve irrespective 
of the value of the mean and the standard deviation, the F distribution is 
a family of curves. A particular curve is determined by two parameters. 
These are the degrees of freedom in the numerator and the degrees of 
freedom in the denominator. The shape of the curve changes as the 
number of degrees of freedom changes. 
(ii) Itis acontinuous distribution and the value of F cannot be negative. 
(iii) The curve representing the F distribution is positively skewed. 
(iv) The values of F theoretically range from zero to infinity. 
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9.5 


In probability theory and statistics, the Beta distribution is a family of 
continuous probability distributions defined on the interval [0, 1] 
parameterized by two positive shape parameters, denoted by o and ß that 
appear as exponents of the random variable and control the shape of the 
distribution. 

In Bayesian inference, the Beta distribution is the conjugate prior probability 
distribution for the Bernoulli, Binomial and Geometric distributions. 

The usual formulation of the Beta distribution is also known as the Beta 
distribution of the first kind, whereas Beta distribution of the second kind is 
an alternative name for the Beta prime distribution. 

The inverse of the Harmonic Mean (H) ofa distribution with random variable 
X is the arithmetic mean of 1/X, or, equivalently, its expected value. 

When population variance is not known, the test is commonly known as 
Student’s ¢-test and is based on the ¢ distribution. 

There are two types of classifications involved in the analysis of variance. 
The one-way analysis of variance refers to the situations when only one fact 
or variable is considered. 

In the second type of classification, the response variable of interest may be 
affected by more than one factor. 

The null hypothesis, that we are going to test, is based upon the assumption 
that there is no significant difference among the means of different populations. 
Each sample of size n is drawn randomly and each sample is independent 
of the other samples. 

The variance between samples may be due to the effect of different 
treatments, meaning that the population means may be affected by the factor 
under consideration, thus, making the population means actually different, 


and some variance may be due to the inter-sample variability. 


KEY WORDS 


The numerator: Since the variance between samples, O° tween comes from 
many samples and if there are k number of samples, then the degrees of 
freedom, associated with the numerator would be (k—1). 


The denominator: It is the mean variance of the variances of k samples and 
since, each variance in each sample is associated with the size of the sample (n), 
then the degrees of freedom associated with each sample would be (n— 1). 
The F-Distribution: It is a continuous distribution and the value of F cannot 
be negative. 
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9.6 SELF-ASSESSMENT QUESTIONS AND 
EXERCISES 


Short-Answers Questions 


1. What is beta distribution? 
2. Give the two conditions to use f test. 
3. What is the analysis of variance? 


4. Define the term degree of freedom. 
Long-Answer Questions 


1. Briefly explain about the beta distribution. 
2. Describe the formulae which are used to calculate the value oft. 
3. Explain the F distribution and give their assumptions. 


4. Discuss about the computation of E 
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10.0 INTRODUCTION 


In probability theory and statistics, the moment generating function of a real- 
valued random variable is an alternative specification of its probability distribution. 
Thus, it provides the basis ofan alternative route to analytical results compared 
with working directly with probability density functions or cumulative distribution 
functions. There are particularly simple results for the moment-generating functions 
of distributions defined by the weighted sums of random variables. However, not 
all random variables have moment-generating functions. As its name implies, the 
moment generating function can be used to compute a distribution’s moments: 
the nth moment about 0 is the nth derivative of the moment-generating function, 
evaluated at 0. 


In this unit, you will study about the distributions of order statistics and the 
moment generating function techniques. 


10.1 OBJECTIVES 


After going through this unit, you will be able to: 
e Understand the distributions of order statistics 


e Explain the moment generating function techniques 


10.2 DISTRIBUTIONS OF ORDER STATISTICS 


In statistics, the Ath order statistic ofa statistical sample is equal to its Ath-smallest 
value. Together with rank statistics, order statistics are among the most fundamental 
tools in non-parametric statistics and inference. Important special cases of the order 
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Helios of statistics are the minimum and maximum value of a sample, and (with some 
raer Statistics 


qualifications discussed below) the sample median and other sample quintiles. 

When using probability theory to analyse order statistics of random samples from 

a continuous distribution, the cumulative distribution function is used to reduce the 
NOTES analysis to the case of order statistics of the uniform distribution. 


10.2.1 The Moment Generating Function 


A function that generates moments of a random variable is known as the Moment 
Generating Function (MGF) of the random variable. Amoment generating function 
may or may not exist but if it exists it is unique. 

For the random variable X, its moment generating function M (t) or y(t) is 
defined as follows: 


W(t) or M(t) = E(e*) 
Se” f; when X is discrete and f; = P(X = x;) 


= Í e™ f(x) dx when X is continuous and f(x) is p.d.f.of X. 


=00 


Note that E(e“) is a function oft. The rth raw moment of Xis the coefficient of 


r 


t. ; ies 
— in the power series expansion of M(t). That is, if 
iY 


My (t) = dy + ayt + ayt” +... too, 


Then, (r)a, = w. 
Hence, u; = E Mo) =M% (0) 
dt” t=0 


Notes: (i) We have M, (t) = Ee) 
Differentiating M(t) with respect to t, k times, we get 
MEO = EX te% 


<. M n (0)= E(X} = Ll), or Œ, provided the moment exists. Thus, 
Q, is obtained by differentiating M (f) k times and putting t= 0. 


(ii) We assume that M (f) or y(t) can be expanded as a power series in ¢ 
by Maclaurin’s theorem. Then we have, 


W(t) = y0) + t y (0) + V'O baa vO = 


5 Evro where w°(0)=y (0) = Ete?) = E(1) = 1 


k=0 


= œ th 
Et 


k=0 
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Thus, it is proved that the Ath order moment about origin œ, if it exists, Distributions of 
Order Statistics 


k 
is equal to the coefficient of ic in the expansion of y(t) as a power 


series in ¢ for any positive integer k. This is why (2) or M (7) is called NOTES 
moment generating function. 


(iii) Ifa and b are constants, then 
(a) My, (Ò = Ele + ) = e” E(e") = e” My) 
(b) M(t) = E(e) = Mat) 


O Mra 0-A Poes elea) ay Q 
b 


Example 1: Find the moment generating function of the random variable X whose 
probability function is, 


if x=-1l 


L 
2 
PO a 
2 
0 


elsewhere 
Hence, obtain the first three raw and central moments of X. 


Solution: We have, 

bfz 

ie tae 
Now, for the raw moments, we expand Mi in an infinite power series as 

follows: 


Mt) = E(e") = 2e 4 re +e) =cosht 


2 44 
Mt) = 1454 ot. . to 00 
Hence, ui =0, u} = 1,1, = 0 = 


[Note M% (0) = 0, M%-(0) =1,M@"-(0) = 0, M? (0) = 1] 
H = 0, Hy = Wy p= 0° 1, pH, =0, UW, =6 


Example 2: Find the moment generating function of the random variable whose 
probability function is, 


1 
— if0<x<2 
f&œ)=;2 


0 otherwise 


Find the central moments also. 


Self-Instructional 
Material 259 


Distributions of 
Order Statistics 


NOTES 


Self-Instructional 
260 Material 


Solution: We see, 
Mt) =E (e™) 


i ilef i 
= fe"dx= l l = (e” 1) 
24 2lt Jo x 


4 4° BÊ 
+— + 


Writing in a power series, we get My(f)= 1 + TET ... t0 0 
2 a3 
Or, My(0) = 1443 o = + ... t0 00 
Now, u =l 
it 
Pores 
yw", =2 
, _ 16 
by > 
u = 0, 
H, = t Y= > 


= 2-3(£)+20°=0 


—16_ 4 2 4_1 
U4 : sn+6(S}@ 3() 7 


Definition: The characteristic function ọ (f) of a random variable X is defined as: 
d) =E) 
Note that the characteristic function ofa random variable always exists. 


A characteristic function of a random variable also gives the moments of the 


random variable In fact, the coefficient of CA , 1s the rth raw moment. 
fh 


Example 3: Obtain the characteristic function of the Laplace variate whose 
probability function is: 
p eel 
f@= ae à —o<x<% A>0. 


Solution: We have 


© |x-u| 
eet Se 
N= |e*—e * ax 
m= fe 
bine OD) Bergen 
= = a a+ fe à dx 
ae a 


fond ors Distributions of 
= hy Si f 7 (aaa 4 EA ë +5 ge Ro Order Statistics 
2n 2n 
Le i 
6g IE Ae a 
re Peale |t Gp; NOTES 
= r| © Fg ata 
2h it — aS 2h it +— 
alee Nel 
1 
in (ete ye [eet 
= e I e 
an k= an it + — 
À 
1 ett e} 
2\1+ità ità-l 
ell 
fare 


Probability Differential: Let X be a continuous random variable. If dx > 0, then 
P(x <X <x + &) = F(x + dx) — F(x). 
= ox F’ (x + 0 dx) where 0< 0< 1 
[by Lagrange mean value theorem of differential calculus] 


X< ; 
So, p(x < x + 6x) = F(x + © &) 
ox 
o eee eras Os ier F'(x + 0 8x) = F” (x) ifx be a point of continuity 
8x0 dx 8x0 
of F’(x) =f (x) where f(x) is the probability density function of X. 
Thus, we get 
. p(x<X <x+6x) 
= | 
I) EA òx 
= lim p(x<X <x+dx) 
dx0 dx 


[~ 6x = differential of x= dx for the independent variable x] 
Henceforth, we shall write f(x) dx for P(x < X < x + dx) which will actually 
ae = f(x). 


The expression P(x < X <x + dx) will always be used in the above limiting 
sense and so there will be no ambiguity throughout our discussion. The expression 
P(x <Xsx+ dx) which is taken to be equal to f(x) dx is called the probability 


differential for the continuous random variable X. 


mean lim 
dx—0 
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ee Calculation of Mean and Variance from Moment Generating Functions 
raer Statistics 


1. Binomial Distribution: Let X be a binomial (n, p) variate, then the M.GF. 
of Xis given by: 


NOTES w(t) = (q + pe)” where q= 1-p 

y(t) =n(q + pe’y" | pe! 
yd) =n(n—1) (q+ pe)" ? (pe? + np (q+ pe" | e 
y’(0) =n(q+ p)" 'p=np=E (X) =m- (pt q)=1] 


And Oy = (0) = n(n — 1) (q +p)? p? + np 
=n(n-1) p* + np 
var(X) =% m =n(n Dp? +np-— np 
=n°p —np* +np-n°p* = np — np* = np(1 -p) 
=npq 
2. Poisson Distribution: Let X be a Poisson A variate, then the M.GF. of Xis 
given by: 


wit) = eed 
w(t) = eMe-)) el 
w(t) 2 Aee eMe-l) + he! eel) 
VO = EU) = m=) 
And a,=w(O)=V +A 
var(X) =O,-m’ =’ +A-VE=A 


. Uniform Distribution: Let X be a uniform random variable in [a, b]; then 
the M.GF. of Xis given by: 


B 1 elt — e“ 
vo =z] 
= y epee” Ra + fash oe aG + 
N= 2 ao A ars 


5B Ne ayy? ay TS DË (4 aye 
t(b- a) | Fe 6 24.77 


Det 2 oha 
=1s(brayoe? +a tab 2 (6 +a )@+a) 3 


+... 
6 24 
24.2 TE 

yo- 222.2 +a +ab) , @ DAL NA 

2 3 8 

E. Diced 
y(t) = b — +ab (b Oey, 
y(0)=E (=m =% 
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b? +a? +ab 


And a, = y”(0) = : 
var(X) = o% wee b? +a +ab C$ j 
3 2 
_ 4b? +4a° + 4ab -3b° — 3a? — 6ab 
12 
_ b +a’ -2ab_ (b-a? 
12 12 
4. Normal Distribution: Let X be a normal (m, ©) variate, then the M.G.F. of 
Xis given by: 
tm + 120? 
wit) = ¢ 


tm+ 1¢2 
w(th=e 2 (m+2to?) 
W(t) = (m + 10° fe ae a 
as yw (0) = E(X) = Mean =m 
And (0) =% =m +o" 
var(X) =% —m? =m? +0 — m= 0" 


5. Exponential Distribution: Let X be a random variable of the exponential 


distribution, then the M.GF. of Xis given by: 


Ba eee Br oe 2 
WOre (1-4) 


À 
saa 
win = > 2 
Wns 5 
yO) = = =m = Mean 


tt 2 
And y (0)=0, = a 


Se ae eee 


he, 9° he 


var(X) = a, —m 


Example 4: The probability mass function of a random variable X is f, = 
P(X=i)=2" where i= 1, 2, 3, ... Find the M.G.F. of X and hence find the mean 


and variance of X. 


Distributions of 
Order Statistics 


NOTES 


Self-Instructional 
Material 263 


Distributions of o 
Order Statistics Solution: Now w(¢) = M.GF. of X =E (e) = ȘX e"f, 


i=l 


NOTES 


ll 
uMs 
N 
| 
M 
"TN 
NIA 
—_Z. 
| 
wv] 
Ms 
PEN 
nA 
—_-Z 
| 


k=0 
t e 
ae [ltrtrters. „] where r= > 
_ “| 1 | _é 1 ee 
2|1—- 2)1-e/2 2-e 
t t t t t t t $ 
-VO = e(2-e)-e(-e) _e[2-e+e]_ 2e 
ve (2-e') Q2-e) @-ey 
mx _ ely? 2e — 2e'2(2-e') (Ce) _ 2[e (2-e') + 2e’e"] 
YO (2-')4 (2-e€y 
_ Aele) 
(2-e') 
w’(0) =m = Mean = E (X) =2 
j 212 +D] 
= = —~ =6 
And y"'(0) = 2-1 


Var(X) = a, -m =6-4=2 


Example 5: Find the M.G.F. of the following continuous distribution with 


probability density function: 


F= seen for x >0 
0 otherwise 
and hence find the mean and variance. 
Solution: The M.G.F. of this distribution is given by: 


wit) = E (e) = f e” f (x) dx = je sed 


—so 


o 2 
= jer Žž dx 
2 


— 3? 


A TE t)? A-0 


e7 z2 
a- gle K 
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e @ where (1 —1)x=z, de= 


ea Distributions of 
= 1 Ze? as ay Order Statistics 
2ze 2e 
21-A | -1 i 
= 2 = 1 
21-0? -0° NOTES 
, 3 ” 12 
t) = A t) = 
VO= e 
y(0) =3 = E(X) = m = Mean 
And y(0) =a, = 12 


var(X) =% -m = 12-9 =3 


Check Your Progress 


1. What is moment generating function? 
2. What is the probability differential? 


3. Calculate the variance from moment generating function of X, by Poisson 
distribution. 


10.3 ANSWERS TO CHECK YOUR PROGRESS 
QUESTIONS 


1. A function that generates moments ofa random variable is known as the 
Moment Generating Function (MGF) of the random variable. A moment 
generating function may or may not exist but if it exists it is unique. 


2. The expression P(x < X <x + dx) will always be used in the above limiting 
sense and so there will be no ambiguity throughout our discussion. The 
expression P(x < X < x + dx) which is taken to be equal to f (x) dx is 
called the probability differential for the continuous random variable X. 


3. Poisson Distribution: Let X be a Poisson A variate, then the M.GF. of Xis 

given by: 
WO 
w(t) = eM -D het 
yv” O = Ae eMe 1) + Aet eMe 1) 
YVO =EX)=m=) 

And œ =y”(0) =A +A 
var(X) =Q,-—m*='’+A-WE=N 


= ere! -1) 
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e Ifaand bare constants, then 
NOTES (a) My, g(t) = Ele + 9 = e” Elfe”) = e” My) 


(b) M t) = E(e™) =M Kat) 


(©) Mxza (LG) 5 let) <u, () 


e The characteristic function ¢,(¢) ofa random variable Xis defined as: 


(= Ee") 
e Let X be a continuous random variable. If 6x > 0, then 
P(x < X <x + Ox) = F(x + &x) — F(x). 


10.5 KEY WORDS 


e Moment Generating Function: A function that generates moments of a 
random variable is known as the Moment Generating Function (MGF) of 
the random variable. 

e Probability differential: The expression P(x < X < x + dx) which is 
taken to be equal to f(x) dx is called the probability differential for the 
continuous random variable X. 


10.6 SELF-ASSESSMENT QUESTIONS AND 
EXERCISES 


Short-Answer Questions 


1. Define moment generating function for the random variable X. 
2. What is probability differential? 
3. Calculate the mean and variance form moment generating functions of 
binomial distribution. 
Long-Answer Questions 


1. Describe the distributions of order statistics. 
2. Briefly explain about the moment generating function. 


3. Give the calculation of mean and variance from moment generating functions. 
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11.0 INTRODUCTION 


In probability theory, the expected value ofa random variable is a generalization 
of the weighted average and intuitively is the arithmetic mean of a large number 
of independent realizations of that variable. The expected value is also known as 
the expectation, mathematical expectation, mean, average, or first moment. 


In probability theory and statistics, a probability distribution is the 
mathematical function that gives the probabilities of occurrence of different possible 
outcomes for an experiment. More specifically, the probability distribution is a 
mathematical description ofa random phenomenon in terms of the probabilities of 
events. For instance, ifthe random variable X is used to denote the outcome ofa 
coin toss experiment, then the probability distribution of X would take the value 
0.5 for X= Heads, and 0.5 for X= Tails (assuming the coin is fair). 


Suppose that to each point of a sample space we assign a number. We then 
have a function defined on the sample space. This function is called a random 
variable (or stochastic variable) or more precisely a random function (stochastic 
function). It is usually denoted by a capital letter, such as X or Y. In general, a 
random variable has some specified physical, geometrical, or other significance. 


A probability distribution is a table or an equation that links each outcome 
of a statistical experiment with its probability of occurrence. 

In this unit, you will study about the distributions of x and ns?/o” and 
expectations of functions of random variables. 


Distribution 


11.1 OBJECTIVES of X and ns?/o? 


After going through this unit, you will be able to: 
e Understand the distributions of x and ns?/o” NOTES 


e Analyse the expectations of functions of random variables 


11.2 THE DISTRIBUTIONS OF X AND nso’ 


In probability theory and statistics, a probability distribution is the mathematical 
function that gives the probabilities of occurrence of different possible outcomes 
for an experiment. More specifically, the probability distribution is a mathematical 
description of a random phenomenon in terms of the probabilities of events. For 
instance, if the random variable X is used to denote the outcome of a coin toss 
experiment, then the probability distribution of X would take the value 0.5 for X= 
Heads, and 0.5 for X= Tails (assuming the coin is fair). 

An example will make clear the relationship between random variables and 
probability distributions. Suppose you flip a coin two times. This simple statistical 
experiment can have four possible outcomes: HH, HT, TH, and TT. Now, let the 
variable X represent the number of Heads that result from this experiment. The 
variable X can take on the values 0, 1, or 2. In this example, X is a random 
variable; because its value is determined by the outcome ofa statistical experiment. 
Therefore, a probability distribution is a table or an equation that links each outcome 
ofa statistical experiment with its probability of occurrence. 

Suppose that to each point ofa sample space we assign a number. We then 
have a function defined on the sample space. This function is called a random 
variable (or stochastic variable) or more precisely a random function (stochastic 
function). It is usually denoted by a capital letter, such as X or Y. In general, a 
random variable has some specified physical, geometrical, or other significance. 


In the test of independence, the row and column variables are independent 
of each other and this is the null hypothesis. The following are properties of the test 
for independence: 


e The data are the observed frequencies. 
e The data is arranged into a contingency table. 


e The degrees of freedom are the degrees of freedom for the row variable 
times the degrees of freedom for the column variable. It is not one less 
than the sample size, it is the product of the two degrees of freedom. 


e [tis always aright tail test. 


e Ithasa Chi-square distribution. 
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eae me e The expected value is computed by taking the row total times the column 
total and dividing by the grand total. 
e The value of the test statistic does not change if the order of the rows or 
NOTES columns are switched. 
e The value of the test statistic does not change if the rows and columns 
are interchanged (transpose of the matrix). 


Contingency Tables 


Suppose the frequencies in the data are classified according to attribute A into r 
classes (rows) and according to attribute B into c classes (columns) as follows: 


Class Bı B, is B. Total 
A, Oi On T Oie (A) 
Ay Ox On os Or (A2) 
A, On On tee Ore (A r) 

Table (Bi) (B2) Sax (Bo) N 


The totals of row and column frequencies are (4), (B). 
To test if there is any relation between A, B we set up the null hypothesis of 
independence between A, B. 


The expected frequency in any cell is calculated by using the formula: 


AXB; 
N 
2 
a ROTA a 
Use X = E with degrees of freedom = (r— 1) (c— 1) 
ij 
For example, 
Observed Frequencies 
School College University Total 
Boys 10 15 25 50 
Girls 25 10 15 50 
Total 35 25 40 100 
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35x50 Distribution 
of X and ns?/o? 
100 
25x50 
100 
40 x 50 NOTES 
100 


Expected Frequencies 


School College University Total 
Boys 17.5 12.5 20 50 
Girls 17.5 12.5 20 50 
Total 35 25 40 100 


Degrees of freedom = (2 — 1) (3-1) =2 

Z =} (O-EY/E=99 

This is greater than the table value. It is not true that education does not 
depend on sex, i.e., the two are not independent. 


Concept of Test Statistics 


In the test for given population variance, the variance is the square of the standard 
deviation, whatever you say about a variance can be, for all practical purposes, 
extended to a population standard deviation. 


To test the hypothesis that a sample x, x,, ...x, of size n has a specified 


variance 0° = 05 


4 2— ~~ 
H, : 0 =O. 
Or, 
Null hypothesis © = 6, 
H,:0° >o; 
2 =\2 
o. 2 ns > a-r) 
Test statistics X = —> = 2 
Oo 00 


IfX’ is greater than the table value we reject the null hypothesis. 


11.3 EXPECTATIONS OF FUNCTIONS OF 
RANDOM VARIABLES 


Ifp happens to be the probability of the happening of an event in a single trial, then 
the expected number of occurrence of that event in 7 trials is given byn, p, where 
n means the number of trials and p means the probability of happening ofan event. 
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Thus, the expectation may be regarded as the likely number of success in 7 trials. 
If probability p is determined as the Relative Frequency in n trials then the 
mathematical expectation in these n trials would be equal to the actual (observed) 
number of successes in these n trials. Mathematical expectation does in no way 
mean that the concerning event must happen the number of times given by the 
mathematical expectation; it simply gives the likely number of the happening of the 
event in trials. Mathematical expectation can be explained with the help of following 
examples. 


Example 1: In 12000 trials of a draw of two fair dice, what is the expected 
number of items that the sum will be less than 4? 


Solution: With two fair dice, the total number of equally likely cases = 6 x 6 = 36 
Number of cases favourable to the event in a single thrown of two dice = 3 
viz, (1+1, 1+2, 2+1) 


31 
The required p = 36 12 


Hence, the expected number of times the total will be less than 4 in 12000 
trials. 


= + x12000 =1000 
12 


The concept of expectation is of great use in the analysis ofall games of 
chance wherein an effort is made to evaluate the expectations of the players. If p 
represents the probability ofa player in any game and M the sum of money which 
he will receive in case of success, the sum of money denoted by (p.M) is called his 
expectation. Thus, the expectation is calculated by finding the probability of success 
(by any of the methods stated so far) and then multiplying it by the money value 
which the player expects in case of success. The significance of this expectation 
lies in the fact that if a player pays more than this, by way of fair price, per game 
then he is sure to lose but if he plays long enough and if he pays less than his 
expectation per game he is certain to win in the long run. It is on this principle that 
speculators and businessmen take decisions in real life situations. 


Example 2: A and B throw with one die for a stake of Rs 44 which is to be won 
by the player who first throws 2. If A has the first throw, what are their respective 
expectation? 


1 
Solution: The chance of throwing 2 with one die = 6 A can win in the first, third, 


fifth .... throws. 
His chance of throwing 2 is, 


Distribution 
1 5 2 of X and ns”/0 
=| l+ =| +... 
Or 6 I z) 
B can win the second, fourth, sixth ... throw NOTES 


His chance of throwing 2 is, 


a SEOL 


*, A’s chance to B’s chance stands as 6 : 5. Hence their respective chances 


6 5 ; ; 
are 77 and T . As such their expectations are as under: 


. 6 264 
A’s expectation = 44 x W t 24 Rs 


; 5 220 
B’s expectation = 44x oe 20 Rs 


Example 3: A person has ten coins which he throws down in succession. He is to 
receive one rupee if the first falls head, two rupees if the second also falls head, 
four rupees if the third also falls head and so on. The amount doubling each time 
but as soon as a coin falls tail he ceases to receive any thing. What is the value of 
his expectation? 


Solution: 


1 
Chance of falling head = F 
: _ 1 
Chance of falling tail = > 
; : . 1 
Chance of falling head in | st trial = 3 


E ; 1 1 
Expectation in 1st trial = 5 xl = 5 Re 


Now ifhe succeeds in getting head in the first trial then only he is allowed to 


do his second trial. 
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of X and ns’ /0 : : . 1 1 1 
*, His chance of success in 2nd trial = 5 x A = 3 
His expectation in the 2nd trial = 5 x2 = 5 Re 


1 3 
Similarly his expectation in third trial = (5) “2° 


and so on upto the 10th trial. 


jl 10 
His expectation in 10th trail = (+ x 2° 


2 
‘. Total expectation (or Expected Monetary Value or EMV) 
1 1 1 1 1 1 1 


=5 Rs 


Example 4: In a given business venture a man can make a profit of Rs 2000 with 
probability 0.8 or suffer a loss of Rs 800 with probability 0.2. Determine his 
expectation. 


Solution: With probability 0.8, the expectation of profit is, 


8 
= —x2000 = 
10 x 1600 Rs 


With probability 0.2, the expectation of loss is, 
Eo ad 
= — x800 =160 Rs 
10 


His overall net expectation in the venture concerned would then clearly be, 

Rs 1600 — 160 = 1440 

Thus in the above two examples the concept of mathematical expectation 
has been extended to discrete random variable which can assume the values X,, 
X,, ....X, with respective probabilities p,, p, ...p, wherep,+p,+ ...p,=1.The 
mathematical expectation of X denoted by E(X) is defined as, 


E(X) = pX, + PX, PA 


Check Your Progress 
1. Give the two properties of the test for independence. 
2. Explain the concept of test statistics. 


3. What is expected number? 


Self-Instructional 
274 Material 


11 


11 


11. 


.4 ANSWERS TO CHECK YOUR PROGRESS 


QUESTIONS 


1. The following are properties of the test for independence 
(a) The data are the observed frequencies. 


(b) The data is arranged into a contingency table. 


2. In the test for given population variance, the variance is the square of the 


standard deviation, whatever you say about a variance can be, for all practical 
purposes, extended to a population standard deviation. 


3. Ifp happens to be the probability of the happening of an event in a single 


trial, then the expected number of occurrence of that event in 7 trials is 
given byn, p, where n means the number of trials and p means the 
probability of happening ofan event. 


5 SUMMARY 


e The degrees of freedom are the degrees of freedom for the row variable 
times the degrees of freedom for the column variable. It is not one less 
than the sample size, it is the product of the two degrees of freedom. 

e The expected value is computed by taking the row total times the column 
total and dividing by the grand total. 

e The value of the test statistic doesn’t change if the order of the rows or 
columns are switched. 

e The value of the test statistic doesn’t change if the rows and columns are 
interchanged (transpose of the matrix). 

e If probability p is determined as the Relative Frequency in n trials then the 


mathematical expectation in these trials would be equal to the actual 
(observed) number of successes in these 7 trials. 


6 KEY WORDS 


e Probability distribution: In probability theory and statistics, a probability 
distribution is the mathematical function that gives the probabilities of 
occurrence of different possible outcomes for an experiment. 


e Null hypothesis: © = ©, 
Hoe >00 

ns’ _ x (x- x) 
-= 


Oo og 


e Test statistics: x = 


Distribution 
of X and ns’/o” 


NOTES 
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11.7 SELF-ASSESSMENT QUESTIONS AND 
EXERCISES 


Short-Answer Questions 


1. Give the formula to calculate the expected frequency in any cell. 
2. In 500 trials of a draw of two fair dice, what is the expected number of 
items that the sum will be less than 3? 


Long-Answer Questions 


1. Describe the distribution of X and ns?/s?. 


2. Briefly explain the expectations of functions of random variables. 
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12.0 INTRODUCTION 


A ‘Limiting Distribution’ is also termed as an ‘Asymptotic Distribution’. 
Fundamentally, it is defined as the hypothetical distribution or convergence of a 
sequence of distributions. Since it is hypothetical, itis not considered as a distribution 
as per the general logic. The asymptotic distribution theory is typically used to find 
a limiting distribution to a series of distributions. The mode of convergence for a 
sequence of random variables are defined on the basis of the convergence in 
probability and in distribution. The concept of convergence leads us to the two 
fundamental results of probability theory, the Law of Large Number and Central 
Limit Theorem (CLT). Limiting probability distributions are significantly used to 
find the appropriate sample sizes. When a sample size is large enough, then a 
statistic’s distribution will form a limiting distribution, assuming that such a distribution 
exists. In probability theory, there exist several different notions of convergence of 
random variables. The convergence of sequences of random variables to some 
limit random variable is an important concept in probability theory, and its 
applications to statistics and stochastic processes. The same concepts are known 
in more general mathematics as “Stochastic Convergence’. 


In this unit, you will study about the limiting distributions, convergence in 
distribution and convergence in probability. 
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12.1 OBJECTIVES 


After going through this unit, you will be able to: 
e Understand the basic concept of limiting distributions 
e Analyse what convergence in distribution is 


e Explain about the convergence in probability 


12.2 LIMITING DISTRIBUTIONS 


A ‘Limiting Distribution’ is also termed as an ‘Asymptotic Distribution’. 
Fundamentally, it is defined as the hypothetical distribution or convergence of a 
sequence of distributions. Since it is hypothetical, it is not considered as a distribution 
as per the general logic. The asymptotic distribution theory is typically used to find 
a limiting distribution to a series of distributions. 


Some of the limiting or asymptotic distributions are well-recognized and 
defined by different statisticians. For example, the sampling distribution of the t- 
statistic will converge to a standard normal distribution if the sample size is large 
enough. 


In basic statistics, the process includes a random sample of observations 
and then fitting that data to a known distribution similar to the normal distribution 
or ¢ distribution. Fitting the statistical data accurately to a known distribution is 
generally very challenging task because of the limited sample sizes. The accurate 
approximation is based on the estimation of ‘presumptions and guesses’ established 
on the nature of large sample statistics. The limiting/asymptotic distribution can be 
applied on small, finite samples for approximating the true distribution ofa random 
variable. 


Limiting probability distributions are significantly used to find the appropriate 
sample sizes. When a sample size is large enough, then a statistic’s distribution will 
form a limiting distribution, assuming that such a distribution exists. 


The Central Limit Theorem (CLT) uses the limit concept for describing the 
behaviour of sample means. The CLT states that the sampling distribution of the 
sampling means approaches a normal distribution as the sample size increases — 
irrespective of the shape of the population distribution. For example, if you consider 
large samples then the graph of the sample means will be similar to a normal 
distribution, even if the graph is skewed or otherwise non-normal. Alternatively, 
the “Limiting Distribution’ for a large set of sample means is the ‘Normal Distribution’. 


Definition: Suppose X is a random sequence with Cumulative Distribution 
Function (CDF) F (X ) and Xis a random variable with CDF F(x). IFF converges 
to Fas n > oo (for all points where F(x) is continuous), then the distribution of x, 
converges to x. This distribution is called the limiting distribution of... 


In simpler terms, it can be stated that the limiting probability distribution of 
X is the limiting distribution of some function of X. 


Convergence of Random Variables 


In probability theory, there exist several different notions of convergence of random 
variables. The convergence of sequences of random variables to some limit random 
variable is an important concept in probability theory, and its applications to statistics 
and stochastic processes. The same concepts are known in more general 
mathematics as ‘Stochastic Convergence’ and they formalize the idea that a 
sequence of essentially random or unpredictable events can sometimes be expected 
to settle down into a behaviour that is essentially unchanging when items far enough 
into the sequence are studied. The different possible notions of convergence relate 
to how such a behaviour can be characterized: two readily understood behaviours 
are that the sequence eventually takes a constant value, and that values in the 
sequence continue to change but can be described by an unchanging probability 
distribution. 


‘Stochastic Convergence’ validates the notion that a sequence of essentially 
random or unpredictable events can sometimes be expected to settle into a pattern. 
The pattern may be, 


e Convergence in the classical sense to a fixed value, perhaps itself coming 
from a random event. 


e An increasing similarity of outcomes to what a purely deterministic 
function would produce. 


e An increasing preference towards a certain outcome. 
e Anincreasing ‘aversion’ against straying far away from a certain outcome. 


e That the probability distribution describing the next outcome may grow 
increasingly similar to a certain distribution. 


Some more theoretical patterns state that the, 


e Series formed by calculating the expected value of the outcome’s distance 
from a particular value may converge to 0. 


e Variance of the random variable describing the next event grows smaller 
and smaller. 


The above mentioned facts define the convergence of a single series to a 
limiting value, the notion of the convergence of two series towards each other is 
also significant and the sequence is defined as either the difference or the ratio of 
the two series. 


If the average of n independent random variables Y, for i= 1, ..., n, all 
having the same finite mean and variance, is given by, 


X, = > 2 Y, 


Limiting Distributions 
and Convergence 


NOTES 


Self-Instructional 
Material 


279 


Limiting Distributions 
and Convergence 


280 


NOTES 


Self-Instructional 
Material 


Then as n tends to infinity, X converges in probability to the common mean, 
LL, of the random variables Y. This result is known as the “Weak Law of Large 
Numbers’. Other forms of convergence are important in other useful theorems, 
including the Central Limit Theorem (CLT). 


Here we assume that (X ) is a sequence of random variables, where Xis a 
random variable, and these are defined on the same probability space (Q, F, Pr). 


12.2.1 Convergence in Distribution 


The convergence in distribution mode provide the required expectation to recognise 
and observe the next outcome in a sequence of random experiments which is well 
modelled by means ofa given probability distribution. 


Convergence in distribution is typically considered as the weakest form of 
convergence, since it is implied by all other types of convergence. However, 
convergence in distribution is very frequently used in practice; most often it arises 
from application of the Central Limit Theorem (CLT). 

Definition: A sequence X, X, ... of real-valued random variables is said 


to converge in distribution, or converge weakly, or converge in law to a random 
variable X if, 


lim F,,(x) = F(x) 
Th © 

For every number x € |R at which Fis continuous. Here F and Fare the 
Cumulative Distribution Functions (CDFs) of random variables X, and X, 
respectively. 

Essentially, only the continuity points of F should be considered. For 
example, if X, are distributed uniformly on intervals (0, 1/n), then this sequence 
converges in distribution to degenerate a random variable X = 0. Certainly, we 
can state that F (x) =0 forall n when x <0, and F (x)= 1 forall x2 1/n when n > 
0. However, for this limiting random variable F (0) = 1, even though F (0)= 
0 for all n. 

Thus the convergence of CDFs fails at the point x = 0 where F is 
discontinuous. 


Convergence in distribution may be denoted as, 


d D £ d 
An > X; Xn X, Xa >X, Xna Ex, 


Xn ~~ X, Xn > X, L(Xna) > L(X), 
Where £y is the probability distribution law of X. For example, if X is 


, d 
standard normal we can write X„ — M (0, 1)- 


For random vectors {X , X,, ...} c R* the convergence in distribution can Limiting Distributions 
Le and Convergence 


be similarly defined. We say that this sequence converges in distribution to a random 
k-vector X if, 


lim Pr( Xn € A) = Pr(X € A) 


For every A CR‘ whichis a continuity set of X. 


The definition of convergence in distribution may be extended from random 
vectors to more general random elements in arbitrary metric spaces, and even to 
the ‘random variables’ which are not measurable, for example a situation which 
occurs in the study of empirical processes. This is referred as the ‘Weak 
Convergence of Laws without Laws Being Defined’ except asymptotically. 


In this case the term ‘Weak Convergence’ is preferably used and we say 
that a sequence of random elements {X } converges weakly to X (denoted 
as X => X)if, 

E*h(X) > EAX) 


For all continuous bounded functions h. Here E* denotes the outer 
expectation, that is the expectation of a ‘Smallest Measurable Function g that 
Dominates A(X y. 


Example of convergence in distribution can be explained by means of the 
newly built dice factory. Suppose a new dice factory has just been built. The first 
few dice come out quite biased, due to imperfections in the production process. 
The outcome from tossing any of them will follow a distribution markedly different 
from the desired uniform distribution. 


As the production in the factory improves, the dice become less and less 
loaded, and the outcomes from tossing a newly produced die will follow the uniform 
distribution more and more closely. 


12.2.2 Convergence in Probability 


The basic notion behind this type of convergence is that the probability of an 
‘unusual’ outcome becomes smaller and smaller as the sequence progresses. 


The concept of convergence in probability is used very often in statistics. 
For example, an estimator is called consistent if it converges in probability to the 
quantity being estimated. Convergence in probability is also the type of convergence 
established by the ‘Weak Law of Large Numbers’. 


Definition: A sequence {X,} of random variables Converges in 
Probability towards the random variable X if for all € > 0, 


lim Pr (|X, — X| >€) =0. 
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More explicitly, let P, be the probability that X, is outside the ball of 
radius € centred at X. Then X is said to converge in probability to X if for any € > 
0 and any 6 > 0 there exists a number N (which may depend on £ and 8) such 
that for all n > N, P < 6 (the Definition of Limit). 

Notice that for the condition to be satisfied, for each n the random 
variables X and X, cannot be independent and thus convergence in probability is 
a condition on the joint CDF’s, as opposed to convergence in distribution, which 
is a condition on the individual CDF’s, unless X is deterministic like for the ‘Weak 
Law of Large Numbers’. At the same time, the case of a deterministic X cannot, 
whenever the deterministic value is a discontinuity point (not isolated), be handled 
by convergence in distribution, where discontinuity points have to be explicitly 
excluded. 

Convergence in probability is denoted by adding the letter ‘p’ over an arrow 
indicating convergence, or using the ‘plim’ probability limit operator: 


P E 
An —? X, Ka EZZ] G X, plim Xn — P 
n00 


For random elements {X } on a separable metric space (S, d), convergence 
in probability is defined similarly by, 


Ve > 0, Pr (d(Xn, X) > £) > 0. 
Properties 


e ‘Convergence in Probability’ implies ‘Convergence in Distribution’. 


e Inthe opposite direction, convergence in distribution implies convergence 
in probability when the limiting random variable X is a constant. 


e Convergence in probability does not imply almost sure convergence. 


e The continuous mapping theorem states that for every continuous 


function g(e), if X ath a then also gi Xp +g.) . 


Check Your Progress 


1. Explain the basic concept of limiting distribution? 

2. When the limiting/asymptotic distribution can be applied? 

3. Why the limiting probability distributions are used in statistics? 

4. Elucidate on the convergence in distribution in probability distribution. 


5. Explain the concept of convergence in probability. Why it is often used in 
statistics? 


Limiting Distributions 


12.3 ANSWERS TO CHECK YOUR PROGRESS and Convergence 


QUESTIONS 
1. A ‘Limiting Distribution’ is also termed as an ‘Asymptotic Distribution’. NOTES 


Fundamentally, it is defined as the hypothetical distribution or convergence 
ofa sequence of distributions. The asymptotic distribution theory is typically 
used to find a limiting distribution to a series of distributions. Some of the 
limiting or asymptotic distributions are well-recognized and defined by 
different statisticians. For example, the sampling distribution of the t-statistic 
will converge to a standard normal distribution if the sample size is large 
enough. 


2. The limiting/asymptotic distribution can be applied on small, finite samples 
for approximating the true distribution ofa random variable. 


3. Limiting probability distributions are significantly used to find the appropriate 
sample sizes. When a sample size is large enough, then a statistic’s distribution 
will form a limiting distribution, assuming that such a distribution exists. In 
statistics, the process includes a random sample of observations and then 
fitting that data to a known distribution similar to the normal distribution or 
t distribution. Fitting the statistical data accurately to a known distribution is 
generally very challenging task because of the limited sample sizes. 


4. The convergence in distribution mode provide the required expectation to 
recognise and observe the next outcome in a sequence of random 
experiments which is well modelled by means of a given probability 
distribution. Convergence in distribution is typically considered as the weakest 
form of convergence, since it is implied by all other types of convergence. 


5. The basic notion behind the convergence in probability type is that the 
probability of an ‘unusual’ outcome becomes smaller and smaller as the 
sequence progresses. 


The concept of convergence in probability is used very often in statistics. 
For example, an estimator is called consistent if it converges in probability 
to the quantity being estimated. 


12.4 SUMMARY 


e A ‘Limiting Distribution’ is also termed as an ‘Asymptotic Distribution’. 
Fundamentally, it is defined as the hypothetical distribution or convergence 
of a sequence of distributions. 


e The asymptotic distribution theory is typically used to find a limiting 
distribution to a series of distributions. 
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Some of the limiting or asymptotic distributions are well-recognized and 
defined by different statisticians. 


In basic statistics, the process includes a random sample of observations 
and then fitting that data to a known distribution similar to the normal 
distribution or ¢ distribution. 


The accurate approximation is based on the estimation of ‘presumptions 
and guesses’ established on the nature of large sample statistics. 


The limiting/asymptotic distribution can be applied on small, finite samples 
for approximating the true distribution ofa random variable. 


Limiting probability distributions are significantly used to find the appropriate 
sample sizes. When a sample size is large enough, then a statistic’s distribution 
will form a limiting distribution, assuming that such a distribution exists. 


The Central Limit Theorem (CLT) uses the limit concept for describing the 
behaviour of sample means. The CLT states that the sampling distribution 
of the sampling means approaches a normal distribution as the sample size 
increases — irrespective of the shape of the population distribution. 


In simpler terms, it can be stated that the limiting probability distribution of 
X is the limiting distribution of some function of X. 


The different possible notions of convergence relate to how sucha behaviour 
can be characterized: two readily understood behaviours are that the 
sequence eventually takes a constant value, and that values in the sequence 
continue to change but can be described by an unchanging probability 
distribution. 


‘Stochastic convergence’ validates the notion that a sequence of essentially 
random or unpredictable events can sometimes be expected to settle into a 
pattern. 


The probability distribution describing the next outcome may grow 
increasingly similar to a certain distribution. 


Series formed by calculating the expected value of the outcome’s distance 
from a particular value may converge to 0. 


Variance of the random variable describing the next event grows smaller 
and smaller. 


The convergence in distribution mode provide the required expectation to 
recognise and observe the next outcome in a sequence of random 
experiments which is well modelled by means of a given probability 
distribution. 


Convergence in distribution is typically considered as the weakest form of 
convergence, since it is implied by all other types of convergence. 


e However, convergence in distribution is very frequently used in practice; ree ee 
. . . . . . an onvergence 
most often it arises from application of the Central Limit Theorem (CLT). 3 


The definition ofconvergence in distribution may be extended from random 
vectors to more general random elements in arbitrary metric spaces, and 
even to the ‘random variables’ which are not measurable, for example a 
situation which occurs in the study of empirical processes. 


NOTES 


The basic notion behind this type of convergence is that the probability of 
an ‘unusual’ outcome becomes smaller and smaller as the sequence 
progresses. 


e The concept of convergence in probability is used very often in statistics. 
For example, an estimator is called consistent if it converges in probability 
to the quantity being estimated. 

e Convergence in probability is denoted by adding the letter ‘p’ over an arrow 
indicating convergence, or using the ‘plim’ probability limit operator: 


p E 
Xn > X, X, — X, plimX, = X. 
ne" x 
e ‘Convergence in Probability’ implies ‘Convergence in Distribution’. 


e Inthe opposite direction, convergence in distribution implies convergence 
in probability when the limiting random variable X is a constant. 


12.5 KEY WORDS 


e Limiting distribution: A ‘Limiting Distribution’ is also termed as an 
“Asymptotic Distribution’, it is defined as the hypothetical distribution or 
convergence of a sequence of distributions. 


Limiting probability distributions: The limiting probability distributions 
are significantly used to find the appropriate sample sizes. 


Central Limit Theorem (CLT): The Central Limit Theorem (CLT) uses 
the limit concept for describing the behaviour of sample means, it states that 
the sampling distribution of the sampling means approaches a normal 
distribution as the sample size increases — irrespective of the shape of the 
population distribution. 


Stochastic convergence: Stochastic convergence validates the notion that 
a sequence of essentially random or unpredictable events can sometimes 
be expected to settle into a pattern. 


Convergence in distribution: Convergence in distribution mode provide 

the required expectation to recognise and observe the next outcome in a 

sequence of random experiments which is well modelled by means ofa 

given probability distribution. 
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EXERCISES 


NOTES Short-Answer Questions 


nA BW N -e 


. What is limiting distribution? 

. Define the term convergence. 

. What is stochastic convergence? 

. Give definition for convergence in distribution. 

. Define the basic concept of convergence in probability. 
6. 


State the properties of convergence. 


Long-Answer Questions 


l. 


Briefly discuss the significance of limiting distributions in the field of probability 
and statistics. 


. Explain about the convergence of random variables giving appropriate 


examples. 


. Discuss the concept of convergence in distribution giving definition and 


appropriate examples. 


. Why the Central Limit Theorem (CLT) uses the limit concept for describing 


the behaviour of sample means? Explain giving appropriate examples. 


. Analyse and discuss the significance of convergence in probability giving 


definition and appropriate examples. 


6. Is N(0, 1/n) close to the N(1/n, 1/n) distribution? Explain. 


7. ‘X converges in distribution to X. Justify the statement with appropriate 


proof. 
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13.0 INTRODUCTION 


In probability theory, the Central Limit Theorem (CLT) establishes that, in some 
situations, when independent random variables are added, their properly normalized 
sum tends toward a normal distribution (informally a bell curve) even if the original 
variables themselves are not normally distributed. The theorem is a key concept in 
probability theory because it implies that probabilistic and statistical methods that 
work for normal distributions can be applicable to many problems involving other 
types of distributions. 


In this unit, you will study about the limiting moment generating function and 
central limit theorem. 


13.1 OBJECTIVES 


After going through this unit, you will be able to: 
e Explain about the limiting moment generating function 


e Understand the central limit theorem 


13.2 LIMITING MOMENT GENERATING 
FUNCTIONS 


Binomial, Poisson, negative binomial and uniform distribution are some of the 
discrete probability distributions. The random variables in these distributions assume 


a finite or enumerably infinite number of values but in nature these are random Limiting Moment 
A g : : n A Generating Function and 
variables which take infinite number of values i.e., these variables can take any Central Limit Theorem 


value in an interval. Such variables and their probability distributions are known as 


continuous probability distributions. 


A random variable Xis the said to be normally distributed if it has the following NORE 


probability density function: 


' a 
fæ&)= e *\ 9 / | for—co<x Seo 


oV2n 


Where u and o> 0 are the parameters of distribution. 


Normal Curve: A curve given by, 


y= Yoe i 


Which is known as the normal curve when origin is taken at mean. 


Then, y, = Yoe ao 


Fig. 13.1 Normal Curve 


Standard Normal Variate : A normal variate with mean zero and standard 
deviation unity, is called a standard normal variate. 


That is; if Xis a standard normal variate then E(X) =0 and V(X)= 1. 
Then, X ~N (0, 1) 


The moment generating function or MGF ofa standard normal variate is 
given as follows: 
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Frequently the exchange of variable in the integral: 


l f g- Œ -W120 
ONV2T “o 


Is used by introducing the following new variable: 
z=7—*..n@o,n 
o 


This new random variable Z simplifies calculations of probabilities etc. 
concerning normally distributed variates. 

BP. ee: : X -u 

Standard Normal Distribution: The distribution ofa random variable Z = <a 

which is known as standard normal variate, is called the standard normal distribution 


or unit normal distribution, where_X has a normal distribution with mean u end 
variance O°. 


The density function of Z is given as follows: 


1 
o(Z) = on 


1 2 
with mean O variance one of MGF e 2: . Normal distribution is the most frequently 
used distribution in statistics. The importance of this distribution is highlighted by 
central limit theorem, mathematical properties, such as the calculation of height, 
weight, the blood pressure of normal individuals, heart diameter measurement, 
etc. They all follow normal distribution if the number of observations is very large. 
Normal distribution also has great importance in statistical inference theory. 


Examples of Normal Distribution: 


1. The height of men of matured age belonging to same race and living in 
similar environments provide a normal frequency distribution. 


2. The heights of trees of the same variety and age in the same locality would 
confirm to the laws of normal curve. 


3. The length of leaves ofa tree form a normal frequency distribution. Though 
some of them are very short and some are long, yet they try to tend towards 
their mean length. 


Example 1: X has normal distribution with u = 50 and 0° = 25. Find out 
(i) The approximate value of the probability density function for X= 50 
(it) The value of the distribution function for x = 50. 


Limiting Moment 


1 (x 2 1 2 ; 3 
na _ -(x- u)“ /20 Generating Function and 
Solution: (i) Kx) = ola e „— 0 SX See, Central Limit Theorem 


for X=50,0?=25, u= 50, you have 


i NOTES 
x) = —~= 0.08. 
Ax) 5V20 
Distribution function f(x) 
x 
ONV2T 


F(50) = 


| 
bes 


Example 2: If Xis anormal variable with mean 8 and standard deviation 4, find 
(i) P[X<5] 
(ii) P[IS<X< 10] 


X- 5-8 
Solution: (i) P[X < 5] = r( H < =) 
Oo 4 
Z>0.75 
X = uZ = 0.75 
k—— 0.5— 
=P (Z<-—0.75) 
=P(Z2 0.75) 
[By Symmetry] 


=0.5—P(0<Z< 0.75) 
[To use relevant table] 
=0.5—0.2734 [See Appendix for value of ‘2”] 
= 0.2266. 
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Limiting Moment = 
Generating Function and (ii) P[5 <X< 10] = (28 <Z7< 10 "| 
Central Limit Theorem 4 4 
= P(- 0.75 < Z < 0.5) 
NOTES = P(— 0.75 < Z < 0) + P(0 < Z < 0.5) 


= P(-0<Z<0.75)+P(0<Z<0.5) 


= 0.2734 + 0.1915 
[See Appendix] 


= 0.4649. 
Example 3: Xis a normal variate with mean 30 and S.D. 5. Find 
(i) P[26 <X< 40] 


(ii) P[| X —30|>5] 
Solution: Here u = 30,0=5. 


P[26 < X < 40] 


X=26X=u X= 40 
Z=-08Z=0 7=9 


N = 
(i) When ¥ =26, Z =- =- 0.8 


X-— 
And for X= 40, pen Ea 


P[26 < X< 40] =P[-0.8<Z<2] 
= P[0 < Z< 0.8] + P[0<Z<2] 
= 0.2881 + 0.4772 = 0.7653 
G) PIX-3|>5] =1-P[|X-3|<5] 
Pl|X-3|<5] =P[25<X<35] 


=2.P(0< Z<1)=0. 
=2 x 0.3413 = 0.6826. 
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So P| X- 3| a 5] zls PII X-3 | s 5] P A 
= ] — 0.6826 = 0.3174. Central Limit Theorem 
13.3 THE CENTRAL LIMIT THEOREM (CLT) NOTES 


The Central Limit Theorem (CLT) has several variants. In its common form, the 
random variables must be identically distributed. In variants, convergence ofthe 
mean to the normal distribution also occurs for non-identical distributions or for 
non-independent observations, ifthey comply with certain conditions. 


The earliest version ofthis theorem, that the normal distribution may be 
used as an approximation to the binomial distribution, is the de Moivre—Laplace 


theorem. 


Let X,, X,,....X, be n independent random variables all of which 
have the same distribution. Let the common expectation and variance be 
u and O, respectively. 


Let 


be 


i 
n 


n 
X=) 
i=l 


Then, the distribution of X approches the normal distribution with mean m 


2 


and variance — asn — oo 
n 


X- 
That is, the variate Z = 2 has standard normal distribution. 
o/ Vn 


Proof: Moment generating function of Z about origin is given as follows: 


Mt) 


II 


E(e”)= E Ao) 


= e7 HEN I5 p(gtxVn/o) 


II 


FA l yn {Mittra tAn) 
Ls i 
e ee: Et eF 
ne Nt ep ee cee 
e o E| evn 
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Limiting Moment Sa 
Generating Function and —t NA t 
Central Limit Theorem —e o M(X; + X> +... + Xn) : 
ovyn 
NOTES p J” 
vn 
e ° |. 


= x 


t 
O s= 44 
; vn 
This is because the random variables are independent and have the same 
MGF by using logarithms, you have: 


—utVn t 
log M (ù= - + log. (= | 


2 
== ERs Js gE EE 
fo) 2! 


ovyn oyn 


—ptvn wt Wat at l í wit J 
= +n +72 +... Eal ag 
o oyn 2! no 2\ ovn 


-utn wiin pa? pit 
= + + +.. 
Oo 20 20 


(6J 


2 
t — , , l d 
=O ee ceo He 


Hence, as n — oo 


2 
t 
log(M_)(t) > z ie. M(t) =e? 


However, this is the M.G.F. ofa standard normal random variable. Thus, 
the random variable Z converges to N. 


This follows that the limiting distribution of ./ as normal with mean p and 


2 


$ (o) 
variance — . 
n 


Check Your Progress 


1. What is standard normal variate? 


2. Define the term standard normal distribution. 
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13.4 ANSWERS TO CHECK YOUR PROGRESS Generating Function and 


QUESTIONS Central Limit Theorem 
1. Anormal variate with mean zero and standard deviation unity, is called a NOTES 
standard normal variate. 


ye 
2. The distribution of a random variable Z = — which is known as 


standard normal variate, is called the standard normal distribution or unit 
normal distribution, where X has a normal distribution with mean u end 
variance O°. 


13.5 SUMMARY 


e Binomial, Poisson, negative binomial and uniform distribution are some of 
the discrete probability distributions. 

e Anormal variate with mean zero and standard deviation unity, is called a 
standard normal variate. 

e The height of men of matured age belonging to same race and living in 
similar environments provide anormal frequency distribution. 

e The length of leaves ofa tree form a normal frequency distribution. Though 
some of them are very short and some are long, yet they try to tend towards 
their mean length. 


13.6 KEY WORDS 


e Variants: In variants, convergence of the mean to the normal distribution 
also occurs for non-identical distributions or for non-independent 
observations, if they comply with certain conditions. 


e Standard normal variate: A normal variate with mean zero and standard 
deviation unity, is called a standard normal variate. 


13.7 SELF-ASSESSMENT QUESTIONS AND 
EXERCISES 


Short-Answer Questions 


1. What is meant by normal curve? 
2. Give the density function of Z. 


3. Write some examples of normal distribution. 
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Long-Answer Questions 


1. Briefly discuss about the limiting moment generating functions. 
2. Explain about the moment generating function ofa standard normal variate. 


3. Discuss the central limit theorem. Give examples. 
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14.2.2 Strong Law of Large Numbers 
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14.4 Summary 

14.5 Key Words 

14.6 Self-Assessment Questions and Exercises 

14.7 Further Readings 


14.0 INTRODUCTION 


A general name for a number of theorems in probability theory that give conditions 
for the appearance of some regularity as the result of the action of a large number 
of random sources. The first limit theorems, established by J. Bernoulli (1713) and 
P. Laplace (1812), are related to the distribution of the deviation of the frequency 
u/n of appearance of some event EE in n independent trials from its 
probability p, 0<p<10<p<1( exact statements can be found in the articles Bernoulli 
theorem; Laplace theorem). S. Poisson (1837) generalized these theorems to the 
case when the probability p, of appearance of E in the k- th trial depends on k, 
by writing down the limiting behaviour, as n — ce, of the distribution of the deviation 


of u/n from the arithmetic mean p = p3 ; p,)! n of the probabilities p, 
1 <k<n (cf. Poisson Theorem). 


In this unit, you will study about the some theorems of limiting distributions. 


14.1 OBJECTIVES 


After going through this unit, you will be able to: 
e Analyse the some theorems of limiting distributions 


e Understand some laws of large numbers 


Some Theorems on 
Limiting Distributions 


NOTES 
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14.2 SOME THEOREMS ON LIMITING 
DISTRIBUTIONS 


Before starting the laws of large numbers, let us define an inequality named as 
Kolmogorov’s inequality. The set of Kolmogorov’s inequalities was defined by 
Kolmogorov in 1928. 

Suppose X, X, ...., X, is a set of independent random variables having 
mean O and variances ©’, 6,’, ....6,”. 

Let CETERO aia aberrant 

Then the probability that all of the inequalities 


1 
|x, +x, +... Hx |< ACn, œ= 1, 2, ...2 hold, is at least fi -5) 


14.2.1 Weak Law of Large Numbers 


Xx tX +. TX, ye co aN 
> 


n n 


If you put, x = 


And à = Pr =o, 
Vn 


where œ is an arbitrary positive number and En is the mathematical expectation of 
the variate 


u = (tx +. Xy X X eee 
. TST aN 
i.e. En = E [(ġ +x +..4%, -4 X... X) ] 
Then, P AHX +. tAn XT XQ T+ An zð 
n n 
En ‘ En 
oa =]-—n provided =—, <n. 
n a n a 


This is known as the Weak Law of Large Numbers. This can also be stated 


With the probability approaching unity or certainty as near as you please, 
you can expect that the arithmetic mean of the values actaully assumed by n variates 
will differ from the mean by their expectations by less than any given number 
however small, provided the number of variates can be taken sufficiently large and 


provided the condition Ën > 0 as > œ is fulfilled. 
n 


In other words, a sequence X, X,, ... X, of random variables is said to Some Theorems on 
; E n Limiting Distributions 
satisfy the weak law of large numbers if 


Sn Sn 
St 5(S)|<e]- NOTES 


lim e| 
n n 


n —> œ 


for any € > 0 where Sn =X, +X, +... +X . The law holds provided Pn — 0Oas 
n 


n — co where Bn = Var. (Sn) < œ. 
14.2.2 Strong Law of Large Numbers 


Consider the sequence X,, X,,....X,, of independent random variables with 
expectation Xp or u; = E(X,) and variance ©’. 


IfSn=X,+ X +... +X and E(S )=m then it can be said that the sequence S, 
S,, «.. S, obeys the strong law of large numbers if every pair e > 0, 5 > 0 
corresponds to N such that there is a probability (1 — 8) or better that for every r 
>Oallr+ 1 inequalities, 


[Sn =m | on | <E 
n 
n=N,N+1,...N+r will be satisfied. 


Example 1: Examine whether the weak law of large numbers holds for the 
sequence {X,} of independent random variables defined as follows: 


P(X, = + 24) = 2-H 
P(X, =0)=1-2% 
Solution: E(X) = XX, p, 
= 2k x Q-Ok+ 1) 4 (- 2") x 22+) 40 x (1 = 2) 
= 2-2k+ 1) [2* = 24] =0. 
E(X?) = Xx. p, 
= (2*? x 2-2k+1) + (- 2") x 2-2k+1) + 02 x él = 22h) 
= 2-2k+ 1) [2% 4 224] =21 +22 =]. 
Vari(X,) = EX) [E(X)P = 1-0=1 


5 reS l=n 


Bn = 
i=l i=l 
Bn . n 
lim — = lim = = lim —=0 
n> n xO n x> Nn 


Self-Instructional 
Material 299 


Some Theorems on Hence, weak law of large numbers holds for the sequence {X,} of 
Limiting Distributions ‘ . 

independent random variables. 
Example 2: Examine whether the laws of large numbers holds for the sequence 
{X} independent random variables which are defined as follows: 


1 
P(X, =+ k”) — 5 


NOTES 


Solution: E(X) = 2X,p, 
a o EE 
=k 2x—+(-k *7)x—=0 
2 2 


= 1 = 1 
E(X2) = £x? .p,=(k ae RS 


a a e 
2 2 
Var. (X) = EX2) - [E(X)P = k’ -0 = k 


Bn = 5 Var (X,)= >" kan} 
i=l i=l 


no M n n>o n 


Hence, the laws of large numbers hold for the sequence {X} of independent 
random variables. 


Check Your Progress 


1. Who defined the set of Kolmogrov's inequalities and when? 


2. Under what circumstances normal approximation can be applied to binomial 
and Poisson distribution. 


14.3 ANSWERS TO CHECK YOUR PROGRESS 
QUESTIONS 


1. The set of Kolmogorov’s inequalities was defined by Kolmogorov in 1928. 


2. When number of trials is large and probability p is close to 1/2, normal 
approximation can be used to for binomial as well as Poisson distribution. 


14.4 SUMMARY 


e The set of Kolmogorov’s inequalities was defined by Kolmogorov in 1928. 
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e Suppose X, X,, ..... X, is a set of independent random variables having 
mean O and variances O’, 6,’, .... 6,7. 


Let CS 6 OF tor Oo 
Then the probability that all of the inequalities 


1 
[x tx, +... +x,|<ACn, œ= 1,2, ..n hold, is at least í -4) 


e A sequence X,, X, ... X, of random variables is said to satisfy the weak 
law of large numbers if 
< | =] 


n n 
14.5 KEY WORDS 
e Strong law of large numbers: If Sn =X, + X,+....+X, and E(S)=m, 
then it can be said that the sequence S, S,, .... S, obeys the strong law of 
large numbers 


lim P| 


nao 


14.6 SELF-ASSESSMENT QUESTIONS AND 
EXERCISES 


Short-Answer Questions 


1. Define Kolmogorov’s inequality. 


2. What is strong law of large numbers? 
Long-Answer Questions 


1. Briefly describe the weak law of large numbers. 


2. Discuss about the strong law of large numbers. 
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